Semester
Summer
Date of Graduation
2019
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Katerina Goseva-Popstojanova
Committee Co-Chair
Roy Nutter
Committee Member
Roy Nutter
Committee Member
Matthew Valenti
Abstract
As the numbers of software vulnerabilities and cybersecurity threats increase, it is becoming more difficult and time consuming to classify bug reports manually. This thesis is focused on exploring techniques that have potential to improve the performance of automated classification of software bug reports as security or non-security related. Using supervised learning, feature selection was used to engineer new feature vectors to be used in machine learning. Feature selection changes the vocabulary used by selecting words with the greatest impact on classification. Feature selection was able to increase the F-Score across the datasets by increasing the precision. We also explored unsupervised classification based on clustering. A distribution of software issues was created using variational autoencoders, where the majority of security related issues were closely related. However, a portion of non-security issues also ended up in the distribution. Furthermore, we explored recent advances in text mining classification based on deep learning. Specifically, we used recurrent networks for supervised and semi-supervised classification. LSTM networks outperformed the Naive Bayes classifier in projects with a high ratio of security related issues. Sequence autoencoders were trained on unlabeled data and tuned with labeled data. The results showed that using unlabeled software issues different from the testing datasets degraded the results. Sequence autoencoders may be used on large datasets, where labeled data is scarce.
Recommended Citation
Gantzer, Tanner D., "Security Bug Report Classification using Feature Selection, Clustering, and Deep Learning" (2019). Graduate Theses, Dissertations, and Problem Reports. 4022.
https://researchrepository.wvu.edu/etd/4022
Embargo Reason
Publication Pending
Included in
Information Security Commons, Other Computer Engineering Commons, Software Engineering Commons