Date of Graduation
Statler College of Engineering and Mineral Resources
Lane Department of Computer Science and Electrical Engineering
Robyn R. Lutz
Roy S. Nutter
Software bugs are expensive to fix and can lead to catastrophic consequences. Therefore, their analysis and the use of machine learning for prediction are of the utmost importance. Many prediction models have been proposed and different factors affecting the prediction performance have been extensively studied. This work addresses four topics in two areas in software engineering: software fault-proneness prediction and analysis and classification of security-related bug reports. The first topic focuses on the effect of the learning approach (i.e., the way software fault-proneness prediction models are trained and tested) on the performance of software fault-proneness prediction which lacks extensive research in this field. The second topic focuses on the effect of imbalance datasets and choice of datasets on the prediction performance. The third part focuses on the empirical analysis of and characteristics of security-related bug reports in open source operating systems. And the final topic is focused on classification of security-related bug reports in open source projects.
In the first part we explore the effect of two learning approaches useAllPredictAll and usePrePredictPost on the performance of software fault-proneness prediction, both within-release and across-releases at the file level. The empirical results are based on datasets extracted from 64 releases of twelve different Apache Open Source Software projects. Using nested design of experiment with two factors and testing that statistical significance, our results show that the prediction performance is highly affected by the choice of the learning approach, implying that the learning approach must be clearly identified and explicitly considered when reporting and comparing the software fault-proneness prediction results.
In the second part, we explore the use of the Group Lasso Regression machine learning algorithm (G-Lasso) and six other machine learning algorithms, and the effects of two factors on the software fault-proneness prediction performance: the imbalance treatment using the Synthetic Minority Over-sampling Technique (SMOTE), and the datasets used in building the prediction models. Our empirical results are based on 22 datasets extracted from open source projects. The main findings include:
(1) SMOTE improved the performance of all learners, but it did not have statistically significant effect on G-Lasso's Recall and G-Score. Random Forest was in the top performing group of learners for all performance metrics, while Naive Bayes performed the worst of all learners.
(2) The choice of the dataset had no effect on the performance of most learners, including G-Lasso. Naive Bayes was the most affected, especially when balanced datasets were used.
The third topic focuses on the characteristics of security-related bug reports and the differences between security-related and non-security-related bug reports in three widely-used operating systems. This part serves as a replicated study which explores several research questions previously explored by several related works
Our analysis shows that most security-related bug reports (1) appeared only in 7% - 34% of the source code packages (2) were somewhat similar in the studied projects (i.e., shared the same eight top dominant vulnerability classes, and 76% - 92% of the CWEs belonged to only five CWE classes), (3) had medium severity and priority levels and (4) had shorter initial response time and were fixed faster than non-security-related bug reports.
The final topic of this dissertation is focused on the classification of bug reports to security-related or not-security-related, a field that gained a lot of attention recently, motivated by the increasing number of security threats and attacks. We proposed a hybrid multimodal machine learning approach that uses feature-level and decision-level fusion strategies to integrate the bug reports' text and bug tracking system modalities for the classification of the bug reports. The proposed approach improved the classification performance significantly for RHL and Ubuntu, and slightly for Fedora. Specifically, it improved the classification performance's F-Score and G-Score by 71.3% and 37.9% respectively for RHL, and by 9.3% and 11.1% respectively for Ubuntu. The improvement was least significant for Fedora, with F-Score and G-Score being improved by 3.0% and 2.5% respectively.
Ahmad, Mohammad Jamil, "Analysis and Classification Of Software Fault-Proneness And Vulnerabilities" (2021). Graduate Theses, Dissertations, and Problem Reports. 8323.