Author

Maen Allaga

Date of Graduation

2016

Document Type

Thesis

Degree Type

MS

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Donald Adjeroh

Committee Co-Chair

Gianfranco Doretto

Committee Member

Elaine Eschen

Abstract

Cellular processes are significantly influenced by the interactions between different RNAs and proteins within cells. This interaction is crucial in understanding gene expressions and gene regulations, and their role in various diseases. Empirical and experimental methods to study this interaction are hampered by the high cost and combinatorial nature of the problem. Consequently, computer science and machine learning methods were applied to predict the interaction between RNAs and proteins.;RNAs are sequences of nucleotides, while proteins are sequences of amino acids. The protein secondary structure describes how amino acids are positioned in three dimensional space. Early methods predicted the interaction between RNA and protein using only sequence information. Recent methods have shown the significance of secondary structure in understanding RNA-Protein interactions.;In this thesis, we explore prediction models for RNA-Protein interaction using two different schemes. The first applied string algorithms to extract the most effective string patterns from both sequences and secondary structures. This method resulted in a 93.39% prediction accuracy. The second method used a feature-based approach by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information resulting in a 94.77% accuracy.

Share

COinS