Motion Interchange Patterns for Action Recognition in Unconstrained Videos

Orit Kliper-Gross

Yaron Gurovich

Tal Hassner

Lior Wolf

The Weizmann Institute of Science Tel-Aviv University The Open University of Israel Tel-Aviv University

New! Slides are now available
New ! Code is now available

Abstract: Action Recognition in videos is an active research field that is fueled by an acute need, spanning several application domains. Still, existing systems fall short of the applications' needs in real-world scenarios, where the quality of the video is less than optimal and the viewpoint is uncontrolled and often not static. In this paper, we consider the key elements of motion encoding and focus on capturing local changes in motion directions. In addition, we decouple image edges from motion edges using a suppression mechanism, and compensate for global camera motion by using an especially fitted registration scheme. Combined with a standard bag-of-words technique, our methods achieves state-of-the-art performance in the most recent and challenging benchmarks.

ReferenceO. Kliper-Gross, Y. Gurovich, T. Hassner, and L. Wolf, Motion Interchange Patterns for Action Recognition in Unconstrained Videos, European Conference on Computer Vision (ECCV), Firenze, Italy, Oct 2012

Click here for the PDF (Note: draft version)
Click here for PDF slides (light version, ~2.5Mb)
Click here for PPTX slides (zip includes videos, ~86.5Mb)
Click here for the BibTex


Related projects:
The Action Similarity Labeling Challenge (ASLAN)

Supplementary video




   
(a)  (b) 
The Motion Interchange Patterns (MIP) representation. (a) Our encoding is based on comparing two SSD scores computed between three patches from three consecutive frames. Relative to the location of the patch in the current frame, the location of the patch in the previous (next) frame is said to be in direction i (j); The angle between directions i and j is denoted alpha. (b) Illustrating the different motion patterns captured by different i and j values. Blue arrows represent motion from a patch in position i in the previous frame; red for the motion to the patch j in the next frame. Shaded diagonal strips indicate same alpha values.



Results

Below are some results of the MIP representation, applied to recent Action Recognition benchmarks. For additional results and more details, please see the paper.
Table 1. Comparison to previous results on the ASLAN benchmark. The average accuracy and standard error on the ASLAN benchmark is given for a list of methods (see text for details). All HOG, HOF, and HNF results are taken from [Kliper-Gross et al. 11, Kliper-Gross et al. 12].  
System No CSML With CSML
  AccuracyAUC Accuracy AUC
LTP [Yeffet & Wolf 09] 55.45 +- 0.6% 57.2 58.50 +- 0.7% 62.4
HOG [Laptev et al. 08] 58.55 +- 0.8% 61.59 60.15 +- 0.6% 64.2
HOF [Laptev et al. 08] 56.82 +- 0.6% 58.56 58.62 +- 1.0% 61.8
HNF [Laptev et al. 08] 58.67 +- 0.9% 62.16 57.20 +- 0.8% 60.5
MIP single channel alpha=0 58.27 +- 0.6% 61.7 61.52 +- 0.8% 66.5
MIP single best channel alpha=1 61.45 +- 0.8% 66.1 63.55 +- 0.8% 69.0
MIP w/o suppression 61.67 +- 0.9% 66.4 63.17 +- 1.1% 68.4
MIP w/o motion compensation 62.27 +- 0.8% 66.4 63.57 +- 1.0% 69.5
MIP w/o both 60.43 +- 1.0% 64.8 63.08 +- 0.9% 68.2
MIP on stabilized clips 59.73 +- 0.77% 62.9 62.30 +- 0.77% 66.4
MIP 62.23 +- 0.8% 67.5 64.62 +- 0.8% 70.4
HOG+HOF+HNF 60.88 +- 0.8% 65.3 63.12 +- 0.9% 68.0
HOG+HOF+HNF with OSSML [Kliper-Gross et al. 11] 62.52 +- 0.8% 66.6 64.25 +- 0.7% 69.1
MIP+HOG+HOF+HNF 64.27 +- 1.0% 69.2 65.45 +- 0.8% 71.92


Table 2. Comparison to previous results on the HMDB51 database. Since our method contains a motion compensation component, we tested our method on the more challenging unstabilized videos. Our method significantly outperforms the best results obtained by previous work.  
System Original clips Stabilized clips
HOG/HOF [Laptev et al. 08] 20.44% 21.96%
C2 [Jhuang et al. 07] 22.83% 23.18%
Action Bank [Sadanand & Corso 12] 26.90% N/A
MIP 29.17% N/A


Table 3. Comparison to previous results on the UCF50 database. Our method significantly outperforms all reported methods.  
System splits LOgO
HOG/HOF [Laptev et al. 08] 47.9% N/A
Action Bank [Sadanand & Corso 12] 57.9% N/A
MIP 68.51% 72.68%



MIP video descriptor for action recognition - Code

MATLAB code for computing the Motion Interchange Patterns (MIP) video descriptor is now available for download here. Please see the ReadMe.txt for information on how to install and run the code.

If you use this code in your own work, please cite our paper.

Copyright and disclaimer:
Copyright 2012, Orit Kliper-Gross, Yaron Gurovich, Tal Hassner, and Lior Wolf

The SOFTWARE ("MIPcode") is provided "as is", without any guarantee made as to its suitability or fitness for any particular use. It may contain bugs, so use of this tool is at your own risk. We take no responsibility for any damage that may unintentionally be caused through its use.

Last update 27th of Dec., 2012