Semester

Spring

Date of Graduation

2022

Document Type

Problem/Project Report

Degree Type

MS

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Matthew Valenti

Committee Co-Chair

Thomas Devine

Committee Member

Thomas Devine

Committee Member

Roy Nutter

Abstract

In many databases, there is private or sensitive data that should not be accessible to any but a few individuals, such as HIPAA (Health Insurance Portability and Accountability Act) protected or LE (law enforcement) data. However, there is often a need to work with the data or change it for proper and thorough testing, especially for the developers . In some cases, the developers may be authorized to access and view the data, but it is rarely allowable for that data to be changed. Further, it is unlikely, especially on a large project, that all of the developers will have the authorization to view the data. In this case, it can be profitable to have easily creatable synthetic or 'fake' data to fill the database that mimics the real data enough to be used in all the same tests and to develop endpoints and APIs that will work with the real data. There are many possible ways to achieve this, such as shuffling the sensitive data information, or filling the sensitive data with garbled information. There are, however, drawbacks to such methods, as the data then becomes unwieldy or nonsensical to work with. Therefore, for this study, a Python library called Factory Boy, was used. Factory Boy can inherit the Django database models and then be used to generate randomized but realistic looking data, capable of mimicking all the complexities of actual database relationships and information.

Share

COinS