The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However it does not model the temporal information of the video stream. We are working at a novel method to introduce temporal information within the BoW approach by modeling a video clip as a sequence of histograms of visual features, computed from each frame using the traditional BoW model.
The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel (e.g using the Needlemann-Wunsch edit distance). Experimental results, performed on two domains, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.
Related Publication:
- Ballan, L., M. Bertini, A. Del Bimbo, and G. Serra, “Video Event Classification using String Kernels“, Multimedia Tools and Applications, vol. 48, pp. 69β87, May, 2010.
- Ballan, L., M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, “Event Detection and Recognition for Semantic Annotation of Video“, Multimedia Tools and Applications, vol. 51, pp. 279β302, January, 2011.
- Ballan, L., M. Bertini, A. Del Bimbo, and G. Serra, “Action Categorization in Soccer Videos using String Kernels“,Proc. of IEEE International Workshop on Content-Based Multimedia Indexing (CBMI), Chania, Crete, June, 2009.
- Ballan, L., M. Bertini, A. Del Bimbo, and G. Serra, “Video Event Classification Using Bag of Words and String Kernels“, Proc. of International Conference on Image Analysis and Processing (ICIAP), Salerno, Italy, September, 2009.