Semester
Spring
Date of Graduation
2022
Document Type
Problem/Project Report
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Matthew Valenti
Committee Co-Chair
Thomas Devine
Committee Member
Thomas Devine
Committee Member
Roy Nutter
Abstract
In many databases, there is private or sensitive data that should not be accessible to any but a few individuals, such as HIPAA (Health Insurance Portability and Accountability Act) protected or LE (law enforcement) data. However, there is often a need to work with the data or change it for proper and thorough testing, especially for the developers . In some cases, the developers may be authorized to access and view the data, but it is rarely allowable for that data to be changed. Further, it is unlikely, especially on a large project, that all of the developers will have the authorization to view the data. In this case, it can be profitable to have easily creatable synthetic or 'fake' data to fill the database that mimics the real data enough to be used in all the same tests and to develop endpoints and APIs that will work with the real data. There are many possible ways to achieve this, such as shuffling the sensitive data information, or filling the sensitive data with garbled information. There are, however, drawbacks to such methods, as the data then becomes unwieldy or nonsensical to work with. Therefore, for this study, a Python library called Factory Boy, was used. Factory Boy can inherit the Django database models and then be used to generate randomized but realistic looking data, capable of mimicking all the complexities of actual database relationships and information.
Recommended Citation
Carrola, Anthony, "Synthesizing Realistic Substitute Data for a Law Enforcement Database using a Python Library" (2022). Graduate Theses, Dissertations, and Problem Reports. 11275.
https://researchrepository.wvu.edu/etd/11275