Jacob P. Tyo

Date of Graduation


Document Type


Degree Type



Statler College of Engineering and Mineral Resources


Lane Department of Computer Science and Electrical Engineering

Committee Chair

Katerina D Goseva-Popstojanova

Committee Co-Chair

Roy S Nutter

Committee Member

Matthew C Valenti


With the ever expanding amount of sensitive data being placed into computer systems, the need for effective cybersecurity is of utmost importance. However, there is a shortage of detailed empirical studies of security vulnerabilities from which cybersecurity metrics and best practices could be determined. This thesis has two main research goals: (1) to explore the distribution and characteristics of security vulnerabilities based on the information provided in bug tracking systems and (2) to develop data analytics approaches for automatic classification of bug reports as security or non-security related. This work is based on using three NASA datasets as case studies. The empirical analysis showed that the majority of software vulnerabilities belong only to a small number of types. Addressing these types of vulnerabilities will consequently lead to cost efficient improvement of software security. Since this analysis requires labeling of each bug report in the bug tracking system, we explored using machine learning to automate the classification of each bug report as a security or non-security related (two-class classification), as well as each security related bug report as specific security type (multiclass classification). In addition to using supervised machine learning algorithms, a novel unsupervised machine learning approach is proposed. Of the machine learning algorithms tested, Naive Bayes was the most consistent, well performing classifier across all datasets. The novel unsupervised approach did not perform as well as the supervised methods, but still performed well resulting in a G-Score of 0.715 in the case of best performance whereas the supervised approach achieved a G-Score of 0.903 in the case of best performance.