Hadfield Simon, Lebeda Karel, Bowden Richard
CVSSP, University of Surrey, GU2 7XH Guildford, UK.
Int J Comput Vis. 2017;121(1):95-110. doi: 10.1007/s11263-016-0917-2. Epub 2016 Jun 21.
Action recognition "in the wild" is extremely challenging, particularly when complex 3D actions are projected down to the image plane, losing a great deal of information. The recent growth of 3D data in broadcast content and commercial depth sensors, makes it possible to overcome this. However, there is little work examining the best way to exploit this new modality. In this paper we introduce the Hollywood 3D benchmark, which is the first dataset containing "in the wild" action footage including 3D data. This dataset consists of 650 stereo video clips across 14 action classes, taken from Hollywood movies. We provide stereo calibrations and depth reconstructions for each clip. We also provide an action recognition pipeline, and propose a number of specialised depth-aware techniques including five interest point detectors and three feature descriptors. Extensive tests allow evaluation of different appearance and depth encoding schemes. Our novel techniques exploiting this depth allow us to reach performance levels more than triple those of the best baseline algorithm using only appearance information. The benchmark data, code and calibrations are all made available to the community.
“自然场景下”的动作识别极具挑战性,尤其是当复杂的3D动作投影到图像平面时,会丢失大量信息。广播内容和商用深度传感器中3D数据的近期增长使得克服这一问题成为可能。然而,很少有工作研究利用这种新模态的最佳方法。在本文中,我们介绍了好莱坞3D基准数据集,这是首个包含“自然场景下”动作视频(包括3D数据)的数据集。该数据集由来自好莱坞电影的650个立体视频片段组成,涵盖14个动作类别。我们为每个片段提供了立体校准和深度重建。我们还提供了一个动作识别流程,并提出了一些专门的深度感知技术,包括五个兴趣点检测器和三个特征描述符。广泛的测试允许对不同的外观和深度编码方案进行评估。我们利用这种深度的新颖技术使我们能够达到的性能水平比仅使用外观信息的最佳基线算法高出两倍多。基准数据、代码和校准都已向社区公开。