Date of Graduation


Document Type


Degree Type



Statler College of Engineering and Mineral Resources


Lane Department of Computer Science and Electrical Engineering

Committee Chair

Tim Menzies.


Software effort estimation (SEE) is the activity of estimating the total effort required to complete a software project. Correctly estimating the effort required for a software project is of vital importance for the competitiveness of the organizations. Both under- and over-estimation leads to undesirable consequences for the organizations. Under-estimation may result in overruns in budget and schedule, which in return may cause the cancellation of projects; thereby, wasting the entire effort spent until that point. Over-estimation may cause promising projects not to be funded; hence, harming the organizational competitiveness.;Due to the significant role of SEE for software organizations, there is a considerable research effort invested in SEE. Thanks to the accumulation of decades of prior research, today we are able to identify the core issues and search for the right principles to tackle pressing questions. For example, regardless of decades of work, we still lack concrete answers to important questions such as: "What is the best SEE method?" The introduced estimation methods make use of local data, however not all the companies have their own data, so: "How can we handle the lack of local data?" Common SEE methods take size attributes for granted, yet size attributes are costly and the practitioners place very little trust in them. Hence, we ask: "How can we avoid the use of size attributes?" Collection of data, particularly dependent variable information (i.e. effort values) is costly: "How can find an essential subset of the SEE data sets?" Finally, studies make use of sampling methods to justify a new method's performance on SEE data sets. Yet, trade-off among different variants is ignored: "How should we choose sampling methods for SEE experiments?";This thesis is a rigorous investigation towards identification and tackling of the pressing issues in SEE. Our findings rely on extensive experimentation performed with a large corpus of estimation techniques on a large set of public and proprietary data sets. We summarize our findings and industrial experience in the form of 12 principles: 1) Know your domain 2) Let the Experts Talk 3) Suspect your data 4) Data Collection is Cyclic 5) Use a Ranking Stability Indicator 6) Assemble Superior Methods 7) Weighting Analogies is Over-elaboration 8) Use Easy-path Design 9) Use Relevancy Filtering 10) Use Outlier Pruning 11) Combine Outlier and Synonym Pruning 12) Be Aware of Sampling Method Trade-off.