Author ORCID Identifier
Semester
Fall
Date of Graduation
2024
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Jeremy Dawson
Committee Member
Nima Karimian
Committee Member
Prashnna Gyawali
Abstract
Speaker recognition is not a new biometric modality but there are still many obstacles in the way in order for it to become as used as fingerprint recognition, facial recognition, and iris recognition. Many real-world environmental conditions, hardware device variations, and human behavior present serious challenges to the use of opportunistic voice or speaker samples for identification purposes. Non-idealities, identified as nuisance factors, include environmental noise, input device quality, length of utterance, sample rate variation, and unscripted data are common nuisance factors that can impact speaker recognition match score performance. The impact of the nuisance factors listed above were evaluated using multiple ‘black-box’ speaker recognition software tools. Results show that the Phonexia Voice Inspector (P) matching tool outperformed VeriSpeak from Neurotechnology (V) in all performance metrics of Area Under the Curve (AUC), Equal Error Rate (EER), and the Area of Intersection (AoI) on a Probability Density Function (PDF) graph. The last performance metric of Kullback-Leibler Divergence (KLD) for both genuine and imposter distributions show similar scores for all genuine distributions and all imposter distributions, proving how comparisons between scores from the two software tools were fair after score normalization.
The nuisance factor that most impacted the match scores on both V and P was downsampling, especially when the sample rate reached 4 kHz when originally sample rate was either 44 kHz or 48 kHz. This compounded with environmental noise, input device quality, and introducing unscripted data presented the worst AUC, EER, and AoI for P. The worst performance metrics generated from data provided by V came from downsampling and unscripted data from the highest quality input device.
Recommended Citation
Meighen, Ethan David, "Effect of Specific Data Variations on Automated Speaker Recognition" (2024). Graduate Theses, Dissertations, and Problem Reports. 12710.
https://researchrepository.wvu.edu/etd/12710
Included in
Medical Biomathematics and Biometrics Commons, Numerical Analysis and Scientific Computing Commons