Semester

Spring

Date of Graduation

2014

Document Type

Thesis

Degree Type

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Gianfranco Doretto

Committee Co-Chair

Hani Ammar

Committee Member

Natalia Schmie

Abstract

The automated recognition of human activities from video is a fundamental problem with applications in several areas, ranging from video surveillance, and robotics, to smart healthcare, and multimedia indexing and retrieval, just to mention a few. However, the pervasive diffusion of cameras capable of recording audio also makes available to those applications a complementary modality. Despite the sizable progress made in the area of modeling and recognizing group activities, and actions performed by people in isolation from video, the availability of audio cues has rarely being leveraged. This is even more so in the area of modeling and recognizing binary interactions between humans, where also the use of video has been limited.;This thesis introduces a modeling framework for binary human interactions based on audio and visual cues. The main idea is to describe an interaction with a spatio-temporal trajectory modeling the visual motion cues, and a temporal trajectory modeling the audio cues. This poses the problem of how to fuse temporal trajectories from multiple modalities for the purpose of recognition. We propose a solution whereby trajectories are modeled as the output of kernel state space models. Then, we developed kernel-based methods for the audio-visual fusion that act at the feature level, as well as at the kernel level, by exploiting multiple kernel learning techniques. The approaches have been extensively tested and evaluated with a dataset made of videos obtained from TV shows and Hollywood movies, containing five different interactions. The results show the promise of this approach by producing a significant improvement of the recognition rate when audio cues are exploited, clearly setting the state-of-the-art in this particular application.

Recommended Citation

Almohsen, Ranya, "Human Interaction Recognition with Audio and Visual Cues" (2014). Graduate Theses, Dissertations, and Problem Reports. 533.
https://researchrepository.wvu.edu/etd/533

Download

COinS

DOI

https://doi.org/10.33915/etd.533

Graduate Theses, Dissertations, and Problem Reports

Human Interaction Recognition with Audio and Visual Cues

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Co-Chair

Committee Member

Abstract

Recommended Citation

DOI

Browse

Resources

Search

Author Corner

Graduate Theses, Dissertations, and Problem Reports

Human Interaction Recognition with Audio and Visual Cues

Author

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Co-Chair

Committee Member

Abstract

Recommended Citation

Share

DOI

Browse

Resources

Search

Author Corner