Date of Graduation
Statler College of Engineering and Mineral Resources
Lane Department of Computer Science and Electrical Engineering
Long-duration visual tracking of people requires the ability to link track snippets (a.k.a. tracklets) based on the identity of people. In lack of the availability of motion priors or hard biometrics (e.g., face, fingerprint, or iris), the common practice is to leverage soft biometrics for matching tracklets corresponding to the same person in different sightings. A common choice is to use the whole-body visual appearance of the person, as determined by the clothing, which is assumed to not change during tracking. The problem is challenging because distinct images of the same person may look very different, since no restrictions are imposed on the nuisance factors of variation, such as pose, illumination, viewpoint, background, and sensor noise, leading to very high intra-class variances, which make this human identification task still prone to high mismatch rates.
We introduce and study models for learning representations for human identification that aim at reducing the effects of nuisance factors. First, we introduce a modeling framework based on learning a low rank representation, which can be applied to face as well as whole-body images. The goal is to not only learn invariant representations for each identity, but also to promote a uniform inter-class separation to further reduce mismatch rates. Another advantage of the approach is a fast procedure for computing and comparing invariant representations for recognition and re-identification. Second, we introduce a learning framework for fusing representations of multiple biometrics for human identification. We focus on the face modality and clothing appearance and develop a representation fusion approach based on the Information Bottleneck method.
In the last part of the dissertation, we improve person re-identification by decreasing the effects of nuisance factors via multi-task learning. We design and combine improved versions of classification and distance metric losses. Classification losses improve their performance by imposing restrictions on the computation of their outputs. This makes their training harder. We mitigate this by investigating the combination of multiple tasks, such as attribute and metric learning, that might regularize the training while improving performance. Finally, we also include the explicit modeling of nuisance factors such as pose, to further improve the invariance of representations. For each model, we show the benefits of the proposed methods by characterizing their performance based on publicly available benchmarks, and by comparing them with the state of the art.
Sabri, Sinan, "Learning Representations for Human Identification" (2022). Graduate Theses, Dissertations, and Problem Reports. 11254.