Harvesting Motion Patterns in Still Images from the Internet

Jiajun WuTsinghua University
Yining WangTsinghua University
Zhulin LiTsinghua University
Zhuowen TuUniversity of California, San Diego


Most vision research on motion analysis focuses on learning human actions from video clips. In this paper, we investigate the use of still images, rather than videos, for motion recognition. We present evidence from both human cognition and computer vision that still images do indeed contain a wealth of information about motion patterns. Our contributions are three-fold. First, we automatically determine classes of motions that can effectively be characterized by still images. To make this determination we introduce the notions of motion verbs (M-verbs) and motion phrases (M-phrases); these refer to linguistic concepts motivated by visual cognition and are not restricted only to motions performed by humans. Second, we build UCSD-1024, a large dataset distilled from more than two million still images. These images come from 1,024 categories of motion; we use crowdsourcing to provide human validation of the motion categories. Third, we exploit motion patterns from UCSD-1024 using a weakly-supervised learning strategy and demonstrate performance competitive with state-of-the-art computer vision action classification methods.


Harvesting Motion Patterns in Still Images from the Internet (2.3 MB)

Back to Table of Contents