Date of Graduation
2016
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Donald Adjeroh
Committee Co-Chair
Gianfranco Doretto
Committee Member
Elaine Eschen
Abstract
Cellular processes are significantly influenced by the interactions between different RNAs and proteins within cells. This interaction is crucial in understanding gene expressions and gene regulations, and their role in various diseases. Empirical and experimental methods to study this interaction are hampered by the high cost and combinatorial nature of the problem. Consequently, computer science and machine learning methods were applied to predict the interaction between RNAs and proteins.;RNAs are sequences of nucleotides, while proteins are sequences of amino acids. The protein secondary structure describes how amino acids are positioned in three dimensional space. Early methods predicted the interaction between RNA and protein using only sequence information. Recent methods have shown the significance of secondary structure in understanding RNA-Protein interactions.;In this thesis, we explore prediction models for RNA-Protein interaction using two different schemes. The first applied string algorithms to extract the most effective string patterns from both sequences and secondary structures. This method resulted in a 93.39% prediction accuracy. The second method used a feature-based approach by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information resulting in a 94.77% accuracy.
Recommended Citation
Allaga, Maen, "RNA-protein interaction prediction: String-based versus feature-based models" (2016). Graduate Theses, Dissertations, and Problem Reports. 5073.
https://researchrepository.wvu.edu/etd/5073