Date of Graduation


Document Type


Degree Type



Statler College of Engineering and Mineral Resources


Lane Department of Computer Science and Electrical Engineering

Committee Chair

Guodong Guo

Committee Co-Chair

Donald Adjeroh

Committee Member

Donald Adjeroh

Committee Member

Xin Li


For face recognition, some very large-scale datasets are publicly available in recent years which are usually collected from the internet using search engines, and thus have many faces with wrong identity labels (outliers). Additionally, the face images in these datasets have different qualities. Since the low quality face images are hard to identify, current automated identity label cleaning methods are not able to detect the identity label error in the low quality faces. Therefore, we propose a novel approach for cleaning the identity label error more low quality faces. Our face identity labels cleaned by our method can train better models for low quality face recognition. The problem of low quality face recognition is very common in the real-life scenarios, where face images are usually captured by surveillance cameras in unconstrained conditions. \\ \\ Our proposed method starts by defining a clean subset for each identity consists of top high-quality face images and top search ranked faces that has the identity label. We call this set the ``identity reference set''. After that, a ``quality adaptive similarity threshold'' is applied to decide on whether a face image from the original identity set is similar to the identity reference set (inlier) or not. The quality adaptive similarity threshold means using adaptive threshold values for faces based on their quality scores. Because the inlier low quality faces have less facial information and are likely to achieve less similarity score to the identity reference than the high-quality inlier faces, using less strict threshold to classify low quality faces saves them from being falsely classified as outlier. \\ \\ In our low-to-high-quality face verification experiments, the deep model trained on our cleaning results of MS-Celeb-1M.v1 outperforms the same model trained using MS-Celeb-1M.v1 cleaned by the semantic bootstrapping method. We also apply our identity label cleaning method on a subset of the CACD face dataset, our quality based cleaning can deliver a higher precision and recall than a previous method.

Embargo Reason

Publication Pending