Date of Graduation

2015

Document Type

Thesis

Degree Type

MS

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Tim Menzies

Committee Co-Chair

Thirimachos Bourlai

Committee Member

Katerina Goseva-Popstojanova

Abstract

Most machine learning techniques rely on a set of user-defined parameters. Changes in the values of these parameters can greatly affect the prediction performance of the learner. These parameters are typically either set to default values or tuned for best performance on a particular type of data. In this thesis, the parameter-space of four machine learners is explored in order to determine the efficacy of parameter tuning within the context of software defect prediction.;A distinction is made between the typical within-version learning scheme and forward learning, in which learners are trained on defect data from one software version and used to predict defects in the following version. The efficacy of selecting parameters based on within- version tuning and applying those parameters to forward learning is tested. This is done by means of a cross-validated parameter-space grid search with each tuning's performance being compared to the performance of the default tuning given the same data.;For the Bernouli naive Bayes classifier and the random forest classifier, it is found that parameter tuning within-version is a viable strategy for increasing forward learning performance. For the logistic regression classifier, it is found that tuning can be effective within a single version, but parameters learned in this manner do not necessarily perform well in the forward learning case. For the multinomial Bayes classifier, no substantial evidence for the efficacy of parameter tuning is found.

Share

COinS