Author ORCID Identifier
Semester
Spring
Date of Graduation
2024
Document Type
Dissertation
Degree Type
PhD
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Donald Adjeroh
Committee Member
Elaine Eschen
Committee Member
Gianfranco Doretto
Committee Member
Jeremy Dawson
Committee Member
Ivan Martinez
Committee Member
Granger Sutton
Abstract
The applied science of bioinformatics encompasses computational analysis of molecular biology data. Advances in genomics and DNA sequencing technology have enabled computational analysis of ribonucleic acids (RNAs), which play diverse and critical roles in most cells. To assist the study of human RNA, we trained machine learning models on RNA nucleotide sequences, devoid of domain knowledge. We built models that distinguish long non-coding lncRNA from protein-coding mRNA, and models that predict the cytoplasmic vs. nuclear preferences of lncRNAs. In a review of published lncRNA subcellular localization classifiers, we show that the commonly used validation protocol generates optimistic performance measures, and we propose a new benchmark for this application of machine learning. To assist the study of plant biology, we applied our own alignment-based method to the analysis of maternal vs. paternal imbalance of mRNA in seeds. We also generated initial results indicating how k-mer-based methods might complement our alignment-based methods. Finally, we developed and published a machine learning method that improved the accuracy of our alignment-based method in the specific case of detecting parental imbalance in interspecies hybrids. These results demonstrate several enhancements to the field of RNA bioinformatics through the application of machine learning.
Recommended Citation
Miller, Jason Rafe, "Machine Learning and RNA Bioinformatics" (2024). Graduate Theses, Dissertations, and Problem Reports. 12419.
https://researchrepository.wvu.edu/etd/12419
Embargo Reason
Publication Pending
Included in
Artificial Intelligence and Robotics Commons, Bioinformatics Commons, Biomedical Informatics Commons, Computational Biology Commons, Plant Breeding and Genetics Commons