Author ORCID Identifier
Semester
Fall
Date of Graduation
2022
Document Type
Dissertation
Degree Type
PhD
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Xin Li
Committee Co-Chair
Ruizhe Wang
Committee Member
Ruizhe Wang
Committee Member
Matthew Valenti
Committee Member
Donald Adjeroh
Committee Member
Natalia Schmid
Abstract
Face representation learning is one of the most popular research topics in the computer vision community, as it is the foundation of face recognition and face image generation. Numerous representation learning frameworks have been integrated into applications in daily life, such as face recognition, image editing, and face tracking. Researchers have developed advanced algorithms for face recognition with successful commercial productions, for example, FaceID on the smartphone. The performance record on face recognition is constantly updated and becoming saturated with the help of large-scale datasets and advanced computational resources. Thanks to the robust representation in face recognition, in this dissertation, we concentrate on face image editing and face tracking tasks from a representation learning view, and several face image editing problems are addressed via the specific frameworks, including semantic beauty mining, beautification, the gender swap and PIE (pose, identity, expression) manipulation.
The first work is learning to represent beauty for face images. The mining of the beauty factor is a crucial step in beautifying a face image. Therefore, we present a novel study on the mining of beauty semantics of facial attributes based on big data, with an attempt to objectively construct descriptions of beauty in a quantitative manner. First, we deploy a deep neural network (CNN) to extract facial attributes. Then we investigated the correlations between these characteristics and attractiveness on two large-scale datasets labeled with beauty scores. Finally, we propose a novel representation learning framework for face beautification thanks to the findings of beauty semantic mining and the latest advances in style-based synthesis. Given a reference face with a high beauty score, our GAN-based architecture is capable of translating an inquiry face into a sequence of beautified face images with a referenced beauty face.
The second work is for gender representation learning. We propose a generative framework that is able to transfer gender without changing identity to apply the aspect of fairness, where it allows us to capture gender-related representations from face images and generate a different gender counterpart of the original image via swapping the gender representations. Our key contributions include: 1) an architecture design with specially tailored loss functions in the feature space for face gender transfer; 2) the introduction of a novel probabilistic gender mask to facilitate achieving both the objectives of gender transfer and identity preservation; and 3) identification of sparse features (~20 out of 256) uniquely responsible for face gender perception.
To maximize image quality, we propose a high-fidelity face manipulation architecture. Rapid advances in face manipulation have demonstrated the feasibility of swapping identities and transferring styles. However, how to achieve both objectives for high-fidelity (1024*1024) face manipulation has remained an open challenge due to their intrinsic conflicting requirements and high computational demands. We propose to learn face disentanglement for high-fidelity photorealistic facial synthesis with precise control over latent representations of triplet attributes: pose, identity, and expression (PIE). We leverage efficient leading neural network architectures for 3D face geometry and identity feature extraction and map them to generate style code in the latent space using pre-trained StyleGAN generators. Decoupling feature encoding from image synthesis allows us to combine the generalization power of style-based encoding with the rich expressiveness of generative face models, without the burden of training them on millions of images.
Finally, we extended the 2D face representation learning to the 3D face by decomposing expression representations from 3D representations. As a result, a systematic avatar animation application is built with the ability to drive 3D avatar expressions.
Recommended Citation
Liu, Xudong, "Face Representation Learning and Its Applications: from Image Editing to 3D Avatar Animation" (2022). Graduate Theses, Dissertations, and Problem Reports. 11471.
https://researchrepository.wvu.edu/etd/11471
Embargo Reason
Publication Pending
Included in
Computational Engineering Commons, Computer and Systems Architecture Commons, Other Computer Engineering Commons