Date of Graduation
2017
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Thirimachos Bourlai
Committee Co-Chair
Jeremy Dawson
Committee Member
Matthew Valenti
Abstract
Internet shopping has spread wide and into social networking. Someone may want to buy a shirt, accessories, etc., in a random picture or a streaming video. In this thesis, the problem of automatic classification was taken upon, constraining the target to jerseys in the wild, assuming the object is detected.;A dataset of 7,840 jersey images, namely the JerseyXIV is created, containing images of 14 categories of various football jersey types (Home and Alternate) belonging to 10 teams of 2015 Big 12 Conference football season. The quality of images varies in terms of pose, standoff distance, level of occlusion and illumination. Due to copyright restrictions on certain images, unaltered original images with appropriate credits can be provided upon request.;While various conventional and deep learning based classification approaches were empirically designed, optimized and tested, a solution that resulted in the highest accuracy in terms of classification was achieved by a train-time fused Convolutional Neural Network (CNN) architecture, namely CNN-F, with 92.61% accuracy. The final solution combines three different CNNs through score level average fusion achieving 96.90% test accuracy. To test these trained CNN models on a larger, application oriented scale, a video dataset is created, which may present an addition of higher rate of occlusion and elements of transmission noise. It consists of 14 videos, one for each class, totaling to 3,584 frames, with 2,188 frames containing the object of interest. With manual detection, the score level average fusion has achieved the highest classification accuracy of 81.31%.;In addition, three Image Quality Assessment techniques were tested to assess the drop in accuracy of the average-fusion method on the video dataset. The Natural Image Quality Evaluator (NIQE) index by Bovik et al. with a threshold of 0.40 on input images improved the test accuracy of the average fusion model on the video dataset to 86.36% by removing the low quality input images before it reaches the CNN.;The thesis concludes that the recommended solution for the classification is composed of data augmentation and fusion of networks, while for application of trained models on videos, an image quality metric would aid in performance increase with a trade-off in loss of input data.
Recommended Citation
Matta, Rohitha Reddy, "On Designing Deep Learning Approaches for Classification of Football Jersey Images in the Wild" (2017). Graduate Theses, Dissertations, and Problem Reports. 6178.
https://researchrepository.wvu.edu/etd/6178