Semester
Summer
Date of Graduation
2019
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Yanfang Ye
Committee Co-Chair
Xin Li
Committee Member
Xin Li
Committee Member
Elaine Eschen
Abstract
As the popularity of modern social coding paradigm such as Stack Overflow grows, its potential security risks increase as well (e.g., insecure codes could be easily embedded and distributed). To address this largely overlooked issue, we bring a new insight to exploit social coding properties in addition to code content for automatic detection of insecure code snippets in Stack Overflow. To determine if the given code snippets are insecure, we not only analyze the code content, but also utilize various kinds of relations among users, badges, questions, answers, code snippets and keywords in Stack Overflow. To model the rich semantic relationships, we first introduce a structured heterogeneous information network (HIN) for representation and then use meta-path based approach to incorporate higher-level semantics to build up relatedness over code snippets. Later, we propose two different novel network embedding models named Snippet2vec and CodeHin2Vec for representation learning in HIN to automate the insecure code snippet detection in Stack Overflow. More specifically, Snippet2vec learns the low dimensional representations for the nodes (i.e., code snippets) in the HIN where both the HIN structures and semantics are maximally preserved, while CodeHin2Vec utilizes HIN to depict relatedness over code snippets to generate code-to-code sequences, based on which sequence-to-sequence (seq2seq) concept in machine translation is further leveraged to learn representations of code snippets. Accordingly, we developed systems ICSD and iTrustSO which integrate our proposed methods respectively in insecure code snippet detection in Stack Overflow. Comprehensive experiments on the data collections from Stack Overflow are conducted to validate the effectiveness of our developed systems by comparisons with the state-of-the-art baseline methods.
Recommended Citation
Hou, Shifu, "Automatic Detection of Insecure Codes in Stack Overflow" (2019). Graduate Theses, Dissertations, and Problem Reports. 4069.
https://researchrepository.wvu.edu/etd/4069
Embargo Reason
Publication Pending