Semester

Summer

Date of Graduation

2005

Document Type

Dissertation

Degree Type

PhD

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Elaine M Eschen

Committee Co-Chair

Cun-Quan Zhang

Abstract

Evidence from investigations of genetic differences among human beings shows that genetic diseases are often the results of genetic mutations. The most common form of these mutations is single nucleotide polymorphism (SNP). A complete map of all SNPs in the human genome will be extremely valuable for studying the relationships between specific haplotypes and specific genetic diseases. Some recent discoveries show that the DNA sequence of human beings can be partitioned into long blocks where genetic recombination has been rare. Then, inferring both haplotypes from chromosome sequences is a biologically meaningful research topic, which has compounded mathematical and computational problems.;We are interested in the algorithmic implications to infer haplotypes from long blocks of DNA that have not undergone recombination in populations. The assumption justifies a model of haplotype evolution---haplotypes in a population evolves along a coalescent, based on the standard population-genetic assumption of infinite sites, which as a rooted tree is a perfect phylogeny. The Perfect Phylogeny Haplotyping (PPH) Problem was introduced by Daniel Gusfield in 2002. A nearly linear-time solution to the PPH problem (O( nmalpha(nm)), where alpha is the extremely slowly growing inverse Ackerman function) is provided. However, it is very complex and difficult to implement. So far, even the best practical solution to the PPH problem has the worst-case running time of O( nm2). D. Gusfield conjectured that a linear-time ( O(nm)) solution to the PPH problem should be possible.;We solve the conjecture of Gusfield by introducing a linear-time algorithm for the PPH problem. Different kinds of posets for haplotype matrices and genotype matrices are designed and the relationships between them are studied. Since redundant calculations can be avoided by the transitivity of partial ordering in posets, we design a linear-time (O(nm )) algorithm for the PPH problem that provides all the possible solutions from an input. The algorithm is fully implemented and the simulation shows that it is much faster than previous methods.

Share

COinS