Building a general human activity recognition and classification system is a challenging problem, because of the variations in environment, people and actions. In fact environment variation can be caused by cluttered or moving background, camera motion, illumination changes. People may have different size, shape and posture appearance. Recently, interest-points based models have been successfully applied to the human action classification problem, because they overcome some limitations of holistic models such as the necessity of performing background subtraction and tracking. We are working at a novel method based on the visual bag-of-words model and on a new spatio-temporal descriptor.
First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning. Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative regions. Therefore, we apply a radius-based clustering method and a soft assignment that considers the information of two or more relevant candidates. This approach generates a more effective codebook resulting in a further improvement of classification performances. We extensively test our approach on standard KTH and Weizmann action datasets showing its validity and outperforming other recent approaches.
Related Publications:
- Ballan, L., M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, βEffective Codebooks for Human Action Representation and Classification in Unconstrained Videosβ, IEEE Transactions on Multimedia, vol. 14, pp. 1234β1245, Aug., 2012.
- Ballan, L., L. Seidenari, G. Serra, M. Bertini, and A. Del Bimbo, βRecognizing Human Actions by using Effective Codebooks and Trackingβ, Advanced Topics in Computer Vision, vol. Advanced Topics in Computer Vision: Springer, 2013.
- Costantini, L., L. Seidenari, G. Serra, L. Capodiferro, and A. Del Bimbo, βSpace-time Zernike Moments and Pyramid Kernel Descriptors for Action Classificationβ, Proc. of International Conference on Image Analysis and Processing (ICIAP), Ravenna, Italy, September, 2011.
- Ballan, L., M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, βEffective Codebooks for Human Action Categorizationβ, Proc. of ICCV International Workshop on Video-oriented Object and Event Classification (VOEC), Kyoto, Japan, IEEE Computer Society, September, 2009.
- Ballan, L., M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, βHuman Action Recognition and Localization using Spatio-temporal Descriptors and Trackingβ, Proc. of AI*IA International Workshop on Pattern Recognition and Artificial Intelligence for Human Behaviour Analysis (PRAI*HBA), Reggio Emilia, Italy, Springer, December, 2009.
- Ballan, L., M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, βRecognizing Human Actions by Fusing Spatio-temporal Appearance and Motion Descriptorsβ, Proc. of IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, IEEE Computer Society, November, 2009.