Date of Graduation


Document Type


Degree Type



Statler College of Engineering and Mineral Resources


Lane Department of Computer Science and Electrical Engineering

Committee Chair

Bojan Cukic

Committee Member

Jeremy Dawson

Committee Member

Donald Adjeroh


One of the reasons testing biometric systems is difficult lays in the fact that the test sample available during technology evaluation may not be sufficiently similar to the usage profile that the system will encounter in operations. As the result, performance expectations derived from testing prior to system deployment may not match actual performance after the deployment. A specific instance of this problem occurs when the data from the field in which biometric system will be deployed is sequestered.

In this study, we simulated the stated scenario using two datasets, originally claimed to be similar and adequate for performance prediction; dataset A, assembled by RAND Corporation and the “sequestered” dataset B, collected in a multi-year project at West Virginia University, sponsored by the FBI. Our objective was to select biometric samples from A that yield match scores that best represent the performance of B, thus enabling accurate prediction.

We developed two groups of approaches. In the first, we select biometric samples from data set A randomly. The subset of samples with overall match score distributions most similar to data set B then represent the best test sample. We use the Probability Mass Function to make distributions based on percentage of score vectors. Kullback-Leibler Divergence, a statistical method, measures the similarity of Probability Mass Functions. The second group of approaches adds carefully chosen weights to match scores. The samples are added into a sorted list by weight, and a number of them are picked to make a test sub set most similar. ROC curve is used to measure how well test performance over subsets of data set A compare to data set B. Weighted selection approaches allow us to construct test data sets that successfully predict match performance of the sequestered biometric data set.