"Effect of Specific Data Variations on Automated Speaker Recognition" by Ethan David Meighen

Author ORCID Identifier

https://orcid.org/0009-0006-1886-4861

Semester

Fall

Date of Graduation

2024

Document Type

Thesis

Degree Type

MS

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Jeremy Dawson

Committee Member

Nima Karimian

Committee Member

Prashnna Gyawali

Abstract

Speaker recognition is not a new biometric modality but there are still many obstacles in the way in order for it to become as used as fingerprint recognition, facial recognition, and iris recognition. Many real-world environmental conditions, hardware device variations, and human behavior present serious challenges to the use of opportunistic voice or speaker samples for identification purposes. Non-idealities, identified as nuisance factors, include environmental noise, input device quality, length of utterance, sample rate variation, and unscripted data are common nuisance factors that can impact speaker recognition match score performance. The impact of the nuisance factors listed above were evaluated using multiple ‘black-box’ speaker recognition software tools. Results show that the Phonexia Voice Inspector (P) matching tool outperformed VeriSpeak from Neurotechnology (V) in all performance metrics of Area Under the Curve (AUC), Equal Error Rate (EER), and the Area of Intersection (AoI) on a Probability Density Function (PDF) graph. The last performance metric of Kullback-Leibler Divergence (KLD) for both genuine and imposter distributions show similar scores for all genuine distributions and all imposter distributions, proving how comparisons between scores from the two software tools were fair after score normalization.

The nuisance factor that most impacted the match scores on both V and P was downsampling, especially when the sample rate reached 4 kHz when originally sample rate was either 44 kHz or 48 kHz. This compounded with environmental noise, input device quality, and introducing unscripted data presented the worst AUC, EER, and AoI for P. The worst performance metrics generated from data provided by V came from downsampling and unscripted data from the highest quality input device.

Share

COinS