Semester
Fall
Date of Graduation
2018
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Jeremy M. Dawson
Committee Co-Chair
Donald Adjeroh
Committee Member
Donald Adjeroh
Committee Member
Stephen DiFazio
Abstract
Bacterial communities found in and on the human body are not only used in studying human health conditions but are also effective in differentiating individuals due to their distinct profiles. Human palm regions harbor relatively more diverse bacterial communities and are indicative of population groups, life styles, geographic locations, age groups and health conditions. Sequences extracted from hypervariable region V3 of the 16S rRNA bacterial gene of hand bacterial samples from 9 different population groups were classified into Operational Taxonomic Units (OTU) with GreenGenes reference taxonomy using RDP (Ribosomal Database Project) classifier. Frequencies of identified OTUs were used to study dissimilarities between samples by calculating the Kullback-Leibler Divergence (KLD) between every two samples. In addition to OTU frequencies, the frequencies of nucleotide k-mers from each OTU sequence were used to study the dissimilarities between samples. Based on the structure, 65 nucleotides of V3 hypervariable region were mapped into 47 elements, and distribution of k-mers from these mapped elements were used to determine dissimilarities between samples. Furthermore, a new technique was applied to classify sequences where sequences were clustered based on their k-mer frequency profile and a unique signature is assigned to every cluster. Frequencies of these signature clusters were used to calculate the KLD between different samples. This method classifies the unknown sequences that were ignored in OTU based methods. Ensemble learning method is applied to each of the above case of k-mers to identify the population group of a given hand bacterial sample. Samples were identified with a range of 51-98 % accuracy for different cases of k-mer distribution. Samples were classified with greater accuracy with k-mer classified sequences than with OTU sequences. Though applied on a small group of samples, these results provide a basis for the use of k-mer distributions in classifying and identifying individuals which could perform better on a broader range of time-varying dataset from other regions of 16S rRNA or even other genes, such as 23S rRNA of bacteria
Recommended Citation
Doppala, Thrisha, "Differentiating Human Populations Based on k-mer Classification of Hand Bacteria" (2018). Graduate Theses, Dissertations, and Problem Reports. 3720.
https://researchrepository.wvu.edu/etd/3720
Embargo Reason
Publication Pending