Human action categorization in unconstrained videos

Building a general human activity recognition and classification system is a challenging problem, because of the variations in environment, people and actions. In fact environment variation can be caused by cluttered or moving background, camera motion, illumination changes. People may have different size, shape and posture appearance. Recently, interest-points based models have been successfully applied to the human action classification problem, because they overcome some limitations of holistic models such as the necessity of performing background subtraction and tracking. We are working at a novel method based on the visual bag-of-words model and on a new spatio-temporal descriptor.

First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning. Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative regions. Therefore, we apply a radius-based clustering method and a soft assignment that considers the information of two or more relevant candidates. This approach generates a more effective codebook resulting in a further improvement of classification performances. We extensively test our approach on standard KTH and Weizmann action datasets showing its validity and outperforming other recent approaches.

Related Publications:

Ballan, L., M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, “Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos“, IEEE Transactions on Multimedia, vol. 14, pp. 1234–1245, Aug., 2012.
Ballan, L., L. Seidenari, G. Serra, M. Bertini, and A. Del Bimbo, “Recognizing Human Actions by using Effective Codebooks and Tracking“, Advanced Topics in Computer Vision, vol. Advanced Topics in Computer Vision: Springer, 2013.
Costantini, L., L. Seidenari, G. Serra, L. Capodiferro, and A. Del Bimbo, “Space-time Zernike Moments and Pyramid Kernel Descriptors for Action Classification“, Proc. of International Conference on Image Analysis and Processing (ICIAP), Ravenna, Italy, September, 2011.
Ballan, L., M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, “Effective Codebooks for Human Action Categorization“, Proc. of ICCV International Workshop on Video-oriented Object and Event Classification (VOEC), Kyoto, Japan, IEEE Computer Society, September, 2009.
Ballan, L., M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, “Human Action Recognition and Localization using Spatio-temporal Descriptors and Tracking“, Proc. of AI*IA International Workshop on Pattern Recognition and Artificial Intelligence for Human Behaviour Analysis (PRAI*HBA), Reggio Emilia, Italy, Springer, December, 2009.
Ballan, L., M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, “Recognizing Human Actions by Fusing Spatio-temporal Appearance and Motion Descriptors“, Proc. of IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, IEEE Computer Society, November, 2009.