Graduate Theses, Dissertations, and Problem Reports

Automated Cleaning of Identity Label Noise in A Large-scale Face Dataset Using A Face Image Quality Control

Mohamad Al jazaeryFollow

Semester

Fall

Date of Graduation

2018

Document Type

Thesis

Degree Type

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Guodong Guo

Committee Co-Chair

Donald Adjeroh

Committee Member

Donald Adjeroh

Committee Member

Xin Li

Abstract

For face recognition, some very large-scale datasets are publicly available in recent years which are usually collected from the internet using search engines, and thus have many faces with wrong identity labels (outliers). Additionally, the face images in these datasets have different qualities. Since the low quality face images are hard to identify, current automated identity label cleaning methods are not able to detect the identity label error in the low quality faces. Therefore, we propose a novel approach for cleaning the identity label error more low quality faces. Our face identity labels cleaned by our method can train better models for low quality face recognition. The problem of low quality face recognition is very common in the real-life scenarios, where face images are usually captured by surveillance cameras in unconstrained conditions. \\ \\ Our proposed method starts by defining a clean subset for each identity consists of top high-quality face images and top search ranked faces that has the identity label. We call this set the ``identity reference set''. After that, a ``quality adaptive similarity threshold'' is applied to decide on whether a face image from the original identity set is similar to the identity reference set (inlier) or not. The quality adaptive similarity threshold means using adaptive threshold values for faces based on their quality scores. Because the inlier low quality faces have less facial information and are likely to achieve less similarity score to the identity reference than the high-quality inlier faces, using less strict threshold to classify low quality faces saves them from being falsely classified as outlier. \\ \\ In our low-to-high-quality face verification experiments, the deep model trained on our cleaning results of MS-Celeb-1M.v1 outperforms the same model trained using MS-Celeb-1M.v1 cleaned by the semantic bootstrapping method. We also apply our identity label cleaning method on a subset of the CACD face dataset, our quality based cleaning can deliver a higher precision and recall than a previous method.

Recommended Citation

Al jazaery, Mohamad, "Automated Cleaning of Identity Label Noise in A Large-scale Face Dataset Using A Face Image Quality Control" (2018). Graduate Theses, Dissertations, and Problem Reports. 3700.
https://researchrepository.wvu.edu/etd/3700

Embargo Reason

Publication Pending

Download

Included in

Computational Engineering Commons

COinS

DOI

https://doi.org/10.33915/etd.3700

Graduate Theses, Dissertations, and Problem Reports

Automated Cleaning of Identity Label Noise in A Large-scale Face Dataset Using A Face Image Quality Control

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Co-Chair

Committee Member

Committee Member

Abstract

Recommended Citation

Embargo Reason

Included in

DOI

Browse

Resources

Search

Author Corner

Graduate Theses, Dissertations, and Problem Reports

Automated Cleaning of Identity Label Noise in A Large-scale Face Dataset Using A Face Image Quality Control

Author

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Co-Chair

Committee Member

Committee Member

Abstract

Recommended Citation

Embargo Reason

Included in

Share

DOI

Browse

Resources

Search

Author Corner