Abstract: Action Recognition in
videos is an active research field that is fueled by an acute need, spanning
several application domains. Still, existing systems fall short of the
applications' needs in real-world scenarios, where the quality of the video is
less than optimal and the viewpoint is uncontrolled and often not static. In
this paper, we consider the key elements of motion encoding and focus on
capturing local changes in motion directions. In addition, we decouple image
edges from motion edges using a suppression mechanism, and compensate for global
camera motion by using an especially fitted registration scheme. Combined with a
standard bag-of-words technique, our methods achieves state-of-the-art
performance in the most recent and challenging benchmarks.
Reference: O. Kliper-Gross, Y.
Gurovich, T. Hassner, and L. Wolf, Motion Interchange Patterns for Action
Recognition in Unconstrained Videos, European Conference on Computer Vision
(ECCV), Firenze, Italy, Oct 2012
The Motion Interchange Patterns (MIP) representation. (a) Our encoding is based on comparing two SSD scores computed between three patches from three consecutive frames.
Relative to the location of the patch in the current frame, the location of the patch in the previous (next) frame
is said to be in direction i (j); The angle between directions i and j is denoted alpha. (b) Illustrating the different
motion patterns captured by different i and j values. Blue arrows represent motion
from a patch in position i in the
previous frame; red for the motion to the patch j in the next frame. Shaded diagonal strips indicate same alpha values.
Below are some results of the MIP
representation, applied to recent Action Recognition benchmarks. For additional results and more details, please see the paper.
Table 1. Comparison to previous results on the
ASLAN benchmark. The average accuracy and standard error on the ASLAN benchmark is given for a list of methods (see text for details). All HOG, HOF, and HNF results are taken from
[Kliper-Gross et al. 11, Kliper-Gross et al. 12].
LTP [Yeffet & Wolf 09]
55.45 +- 0.6%
58.50 +- 0.7%
HOG [Laptev et al. 08]
58.55 +- 0.8%
60.15 +- 0.6%
HOF [Laptev et al. 08]
56.82 +- 0.6%
58.62 +- 1.0%
HNF [Laptev et al. 08]
58.67 +- 0.9%
57.20 +- 0.8%
MIP single channel alpha=0
58.27 +- 0.6%
61.52 +- 0.8%
MIP single best channel alpha=1
61.45 +- 0.8%
63.55 +- 0.8%
MIP w/o suppression
61.67 +- 0.9%
63.17 +- 1.1%
MIP w/o motion compensation
62.27 +- 0.8%
63.57 +- 1.0%
MIP w/o both
60.43 +- 1.0%
63.08 +- 0.9%
MIP on stabilized clips
59.73 +- 0.77%
62.30 +- 0.77%
62.23 +- 0.8%
64.62 +- 0.8%
60.88 +- 0.8%
63.12 +- 0.9%
HOG+HOF+HNF with OSSML [Kliper-Gross et al. 11]
62.52 +- 0.8%
64.25 +- 0.7%
64.27 +- 1.0%
65.45 +- 0.8%
Table 2. Comparison to previous results on the HMDB51 database. Since our method
contains a motion compensation component, we tested our method on the more challenging unstabilized videos. Our method significantly
outperforms the best results obtained by previous work.
HOG/HOF [Laptev et al. 08]
C2 [Jhuang et al. 07]
Action Bank [Sadanand & Corso 12]
Table 3. Comparison to previous results on the UCF50 database. Our method significantly outperforms all reported methods.
MATLAB code for computing the Motion Interchange Patterns (MIP) video descriptor is now available for
here. Please see the
for information on how to install and run the code.
If you use this code in your own work, please cite our
Copyright and disclaimer:
Copyright 2012, Orit Kliper-Gross, Yaron Gurovich, Tal Hassner, and Lior Wolf
The SOFTWARE ("MIPcode") is provided "as is", without any guarantee made as to
its suitability or fitness for any particular use. It may contain bugs, so use
of this tool is at your own risk. We take no responsibility for any damage that
may unintentionally be caused through its use.