Semester
Summer
Date of Graduation
2005
Document Type
Dissertation
Degree Type
PhD
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Elaine M Eschen
Committee Co-Chair
Cun-Quan Zhang
Abstract
Evidence from investigations of genetic differences among human beings shows that genetic diseases are often the results of genetic mutations. The most common form of these mutations is single nucleotide polymorphism (SNP). A complete map of all SNPs in the human genome will be extremely valuable for studying the relationships between specific haplotypes and specific genetic diseases. Some recent discoveries show that the DNA sequence of human beings can be partitioned into long blocks where genetic recombination has been rare. Then, inferring both haplotypes from chromosome sequences is a biologically meaningful research topic, which has compounded mathematical and computational problems.;We are interested in the algorithmic implications to infer haplotypes from long blocks of DNA that have not undergone recombination in populations. The assumption justifies a model of haplotype evolution---haplotypes in a population evolves along a coalescent, based on the standard population-genetic assumption of infinite sites, which as a rooted tree is a perfect phylogeny. The Perfect Phylogeny Haplotyping (PPH) Problem was introduced by Daniel Gusfield in 2002. A nearly linear-time solution to the PPH problem (O( nmalpha(nm)), where alpha is the extremely slowly growing inverse Ackerman function) is provided. However, it is very complex and difficult to implement. So far, even the best practical solution to the PPH problem has the worst-case running time of O( nm2). D. Gusfield conjectured that a linear-time ( O(nm)) solution to the PPH problem should be possible.;We solve the conjecture of Gusfield by introducing a linear-time algorithm for the PPH problem. Different kinds of posets for haplotype matrices and genotype matrices are designed and the relationships between them are studied. Since redundant calculations can be avoided by the transitivity of partial ordering in posets, we design a linear-time (O(nm )) algorithm for the PPH problem that provides all the possible solutions from an input. The algorithm is fully implemented and the simulation shows that it is much faster than previous methods.
Recommended Citation
Liu, Yunkai, "Graph algorithms for the haplotyping problem" (2005). Graduate Theses, Dissertations, and Problem Reports. 4170.
https://researchrepository.wvu.edu/etd/4170