Document Type


Publication Date



Lane Department of Computer Science and Electrical Engineering


Background: While continental level ancestry is relatively simple using genomic information, distinguishing between individuals from closely associated sub-populations (e.g., from the same continent) is still a difficult challenge. Methods: We study the problem of predicting human biogeographical ancestry from genomic data under resource constraints. In particular, we focus on the case where the analysis is constrained to using single nucleotide polymorphisms (SNPs) from just one chromosome. We propose methods to construct such ancestry informative SNP panels using correlation-based and outlier-based methods. Results: We accessed the performance of the proposed SNP panels derived from just one chromosome, using data from the 1000 Genome Project, Phase 3. For continental-level ancestry classification, we achieved an overall classification rate of 96.75% using 206 single nucleotide polymorphisms (SNPs). For sub-population level ancestry prediction, we achieved an average pairwise binary classification rates as follows: subpopulations in Europe: 76.6% (58 SNPs); Africa: 87.02% (87 SNPs); East Asia: 73.30% (68 SNPs); South Asia: 81.14% (75 SNPs); America: 85.85% (68 SNPs). Conclusion: Our results demonstrate that one single chromosome (in particular, Chromosome 1), if carefully analyzed, could hold enough information for accurate prediction of human biogeographical ancestry. This has significant implications in terms of the computational resources required for analysis of ancestry, and in the applications of such analyses, such as in studies of genetic diseases, forensics, and soft biometrics.

Source Citation

Toma TT, Dawson JM, Adjeroh DA. Human ancestry indentification under resource constraints -- what can one chromosome tell us about human biogeographical ancestry? BMC Medical Genomics. 2018;11(S5). doi:10.1186/s12920-018-0412-4