Author ORCID Identifier

https://orcid.org/0009-0006-1886-4861

Semester

Fall

Date of Graduation

2024

Document Type

Thesis

Degree Type

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Jeremy Dawson

Committee Member

Nima Karimian

Committee Member

Prashnna Gyawali

Abstract

Speaker recognition is not a new biometric modality but there are still many obstacles in the way in order for it to become as used as fingerprint recognition, facial recognition, and iris recognition. Many real-world environmental conditions, hardware device variations, and human behavior present serious challenges to the use of opportunistic voice or speaker samples for identification purposes. Non-idealities, identified as nuisance factors, include environmental noise, input device quality, length of utterance, sample rate variation, and unscripted data are common nuisance factors that can impact speaker recognition match score performance. The impact of the nuisance factors listed above were evaluated using multiple ‘black-box’ speaker recognition software tools. Results show that the Phonexia Voice Inspector (P) matching tool outperformed VeriSpeak from Neurotechnology (V) in all performance metrics of Area Under the Curve (AUC), Equal Error Rate (EER), and the Area of Intersection (AoI) on a Probability Density Function (PDF) graph. The last performance metric of Kullback-Leibler Divergence (KLD) for both genuine and imposter distributions show similar scores for all genuine distributions and all imposter distributions, proving how comparisons between scores from the two software tools were fair after score normalization.

The nuisance factor that most impacted the match scores on both V and P was downsampling, especially when the sample rate reached 4 kHz when originally sample rate was either 44 kHz or 48 kHz. This compounded with environmental noise, input device quality, and introducing unscripted data presented the worst AUC, EER, and AoI for P. The worst performance metrics generated from data provided by V came from downsampling and unscripted data from the highest quality input device.

Recommended Citation

Meighen, Ethan David, "Effect of Specific Data Variations on Automated Speaker Recognition" (2024). Graduate Theses, Dissertations, and Problem Reports. 12710.
https://researchrepository.wvu.edu/etd/12710

Download

Included in

Medical Biomathematics and Biometrics Commons, Numerical Analysis and Scientific Computing Commons

COinS

DOI

https://doi.org/10.33915/etd.12710

Graduate Theses, Dissertations, and Problem Reports

Effect of Specific Data Variations on Automated Speaker Recognition

Author ORCID Identifier

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

DOI

Browse

Resources

Search

Author Corner

Graduate Theses, Dissertations, and Problem Reports

Effect of Specific Data Variations on Automated Speaker Recognition

Author

Author ORCID Identifier

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Share

DOI

Browse

Resources

Search

Author Corner