Video event classification using bag-of-words and string kernels

The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However it does not model the temporal information of the video stream. We are working at a novel method to introduce temporal information within the BoW approach by modeling a video clip as a sequence of histograms of visual features, computed from each frame using the traditional BoW model.

The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel (e.g using the Needlemann-Wunsch edit distance). Experimental results, performed on two domains, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.

Related Publication:

Ballan, L., M. Bertini, A. Del Bimbo, and G. Serra, “Video Event Classification using String Kernels“, Multimedia Tools and Applications, vol. 48, pp. 69–87, May, 2010.
Ballan, L., M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, “Event Detection and Recognition for Semantic Annotation of Video“, Multimedia Tools and Applications, vol. 51, pp. 279–302, January, 2011.
Ballan, L., M. Bertini, A. Del Bimbo, and G. Serra, “Action Categorization in Soccer Videos using String Kernels“,Proc. of IEEE International Workshop on Content-Based Multimedia Indexing (CBMI), Chania, Crete, June, 2009.
Ballan, L., M. Bertini, A. Del Bimbo, and G. Serra, “Video Event Classification Using Bag of Words and String Kernels“, Proc. of International Conference on Image Analysis and Processing (ICIAP), Salerno, Italy, September, 2009.