Author ORCID Identifier



Date of Graduation


Document Type


Degree Type



Statler College of Engineering and Mineral Resources


Lane Department of Computer Science and Electrical Engineering

Committee Chair

Xin Li

Committee Co-Chair

Ruizhe Wang

Committee Member

Ruizhe Wang

Committee Member

Matthew Valenti

Committee Member

Donald Adjeroh

Committee Member

Natalia Schmid


Face representation learning is one of the most popular research topics in the computer vision community, as it is the foundation of face recognition and face image generation. Numerous representation learning frameworks have been integrated into applications in daily life, such as face recognition, image editing, and face tracking. Researchers have developed advanced algorithms for face recognition with successful commercial productions, for example, FaceID on the smartphone. The performance record on face recognition is constantly updated and becoming saturated with the help of large-scale datasets and advanced computational resources. Thanks to the robust representation in face recognition, in this dissertation, we concentrate on face image editing and face tracking tasks from a representation learning view, and several face image editing problems are addressed via the specific frameworks, including semantic beauty mining, beautification, the gender swap and PIE (pose, identity, expression) manipulation.

The first work is learning to represent beauty for face images. The mining of the beauty factor is a crucial step in beautifying a face image. Therefore, we present a novel study on the mining of beauty semantics of facial attributes based on big data, with an attempt to objectively construct descriptions of beauty in a quantitative manner. First, we deploy a deep neural network (CNN) to extract facial attributes. Then we investigated the correlations between these characteristics and attractiveness on two large-scale datasets labeled with beauty scores. Finally, we propose a novel representation learning framework for face beautification thanks to the findings of beauty semantic mining and the latest advances in style-based synthesis. Given a reference face with a high beauty score, our GAN-based architecture is capable of translating an inquiry face into a sequence of beautified face images with a referenced beauty face.

The second work is for gender representation learning. We propose a generative framework that is able to transfer gender without changing identity to apply the aspect of fairness, where it allows us to capture gender-related representations from face images and generate a different gender counterpart of the original image via swapping the gender representations. Our key contributions include: 1) an architecture design with specially tailored loss functions in the feature space for face gender transfer; 2) the introduction of a novel probabilistic gender mask to facilitate achieving both the objectives of gender transfer and identity preservation; and 3) identification of sparse features (~20 out of 256) uniquely responsible for face gender perception.

To maximize image quality, we propose a high-fidelity face manipulation architecture. Rapid advances in face manipulation have demonstrated the feasibility of swapping identities and transferring styles. However, how to achieve both objectives for high-fidelity (1024*1024) face manipulation has remained an open challenge due to their intrinsic conflicting requirements and high computational demands. We propose to learn face disentanglement for high-fidelity photorealistic facial synthesis with precise control over latent representations of triplet attributes: pose, identity, and expression (PIE). We leverage efficient leading neural network architectures for 3D face geometry and identity feature extraction and map them to generate style code in the latent space using pre-trained StyleGAN generators. Decoupling feature encoding from image synthesis allows us to combine the generalization power of style-based encoding with the rich expressiveness of generative face models, without the burden of training them on millions of images.

Finally, we extended the 2D face representation learning to the 3D face by decomposing expression representations from 3D representations. As a result, a systematic avatar animation application is built with the ability to drive 3D avatar expressions.

Embargo Reason

Publication Pending