Date of Graduation

2015

Document Type

Thesis

Degree Type

MS

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Katerina Goseva-Popstojanova

Committee Co-Chair

Hany Ammar

Committee Member

Roy S Nutter

Abstract

This thesis addresses several aspects of using static code analysis tools for detection of security vulnerabilities and faults within source code. First, the performance of three widely used static code analysis tools with respect to detection of security vulnerabilities is evaluated. This is done with the help of a large benchmarking suite designed to test static code analysis tools' performance regarding security vulnerabilities. The performance of the three tools is also evaluated using three open source software projects with known security vulnerabilities. The main results of the first part of this thesis showed that the three evaluated tools do not have significantly different performance in detecting security vulnerabilities. 27% of C/C++ vulnerabilities along with 11% of Java vulnerabilities were not detected by any of the three tools. Furthermore, overall recall values for all three tools were close to or below 50% indicating performance comparable or worse than random guessing. These results were corroborated by the tools' performance on the three real software projects. The second part of this thesis is focused on machine-learning based classification of messages extracted from static code analysis reports. This work is based on data from five real NASA software projects. A classifier is trained on increasing percentages of labeled data in order to emulate an on-going analysis effort for each of the five datasets. Results showed that classification performance is highly dependent on the distribution of true and false positives among source code files. One of the five datasets yielded good predictive classification regarding true positives. One more dataset led to acceptable performance, while the remaining three datasets failed to yield good results. Investigating the distribution of true and false positives revealed that messages were classified successfully when either only real faults and/or only false faults were clustered in files or were flagged by the same checker. The high percentages of false positive singletons (files or checkers that produced 0 true positives and 1 false negative) were found to negatively affect the classifier's performance.

Share

COinS