Semester

Fall

Date of Graduation

2009

Document Type

Dissertation

Degree Type

PhD

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Bojan Cukic

Committee Co-Chair

Tim Menzies

Abstract

It is difficult to build high quality software with limited quality assurance budgets. Software fault prediction models can be used to learn fault predictors from software metrics. Fault prediction prior to software release can guide Verification and Validation (V&V) activity and allocate scarce resources to modules which are predicted to be fault-prone.;One of the most important goals of fault prediction is to detect fault prone modules as early as possible in the software development life cycle. Design and code metrics have been successfully used for predicting fault-prone modules. In this dissertation, we introduce fault prediction from software requirements. Furthermore, we investigate the advantages of the incremental development of software fault prediction models, and we compare the performance of these models as the volume of data and their life cycle origin (design, code, or their combination) evolution during project development. We confirm that increasing the volume of training data improves model performance. And that, models built from code metrics typically outperform those built using design metrics only. However, both types of models prove to be useful as they can be constructed in different phases of the life cycle. We also demonstrate that models that utilize a combination of design and code level metrics outperform models which use either one metric set exclusively.;In evaluation of fault prediction models, misclassification cost has been neglected. Using a graphical measurement, the cost curve, we evaluate software fault prediction models. Cost curves not only allow software quality engineers to introduce project-specific misclassification costs into model evaluation, but also allow them to incorporate module-specific misclassification costs into model evaluation. Classifying a software module as fault-prone implies the application of some verification activities, thus adding to the development cost. Misclassifying a module as fault free carries the risk of system failure, and is also associated with cost implications. Our results, through the analysis of more than ten projects from public repositories, support a recommendation to adopt cost curves as one of the standard methods for software fault prediction model performance evaluation.

Share

COinS