Date of Graduation


Document Type


Degree Type



Statler College of Engineering and Mineral Resources


Lane Department of Computer Science and Electrical Engineering

Committee Chair

Doretto Gianfranco

Committee Co-Chair

Adjeroh Donald

Committee Member

Li Xin


The goal of this work is the temporal localization and recognition of binary people interactions in video. Human-human interaction detection is one of the core problems in video analysis. It has many applications such as in video surveillance, video search and retrieval, human-computer interaction, and behavior analysis for safety and security. Despite the sizeable literature in the area of activity and action modeling and recognition, the vast majority of the approaches make the assumption that the beginning and the end of the video portion containing the action or the activity of interest is known. In other words, while a significant effort has been placed on the recognition, the spatial and temporal localization of activities, i.e. the detection problem, has received considerably less attention. Even more so, if the detection has to be made in an online fashion, as opposed to offline. The latter condition is imposed by almost the totality of the state-of-the-art, which makes it intrinsically unsuited for real-time processing. In this thesis, the problem of event localization and recognition is addressed in an online fashion. The main assumption is that an interaction, or an activity is modeled by a temporal sequence. One of the main challenges is the development of a modeling framework able to capture the complex variability of activities, described by high dimensional features. This is addressed by the combination of linear models with kernel methods. In particular, the parity space theory for detection, based on Euclidean geometry, is augmented to be able to work with kernels, through the use of geometric operators in Hilbert space. While this approach is general, here it is applied to the detection of human interactions. It is tested on a publicly available dataset and on a large and challenging, newly collected dataset. An extensive testing of the approach indicates that it sets a new state-of-the-art under several performance measures, and that it holds the promise to become an effective building block for the analysis in real-time of human behavior from video.