Perception Team, INRIA Rhone-Alpes, 655 Avenue de l'Europe, Montbonnot Saint-Martin, Grenoble, Rhone-Alpes, France.
IEEE Trans Pattern Anal Mach Intell. 2013 Oct;35(10):2371-86. doi: 10.1109/TPAMI.2013.56.
This paper addresses the problem of video alignment. We present efficient approaches that allow for spatiotemporal alignment of two sequences. Unlike most related works, we consider independently moving cameras that capture a 3D scene at different times. The novelty of the proposed method lies in the adaptation and extension of an efficient information retrieval framework that casts the sequences as an image database and a set of query frames, respectively. The efficient retrieval builds on the recently proposed quad descriptor. In this context, we define the 3D Vote Space (VS) by aggregating votes through a multiquerying (multiscale) scheme and we present two solutions based on VS entries; a causal solution that permits online synchronization and a global solution through multiscale dynamic programming. In addition, we extend the recently introduced ECC image-alignment algorithm to the temporal dimension that allows for spatial registration and synchronization refinement with subframe accuracy. We investigate full search and quantization methods for short descriptors and we compare the proposed schemes with the state of the art. Experiments with real videos by moving or static cameras demonstrate the efficiency of the proposed method and verify its effectiveness with respect to spatiotemporal alignment accuracy.
本文针对视频对齐问题。我们提出了有效的方法,可以实现两个序列的时空对齐。与大多数相关工作不同,我们考虑了独立移动的摄像机,它们在不同的时间捕获 3D 场景。所提出方法的新颖之处在于对高效信息检索框架的适应和扩展,该框架分别将序列视为图像数据库和一组查询帧。高效的检索基于最近提出的四元描述符。在这种情况下,我们通过多查询(多尺度)方案聚合投票来定义 3D 投票空间(VS),并提出了两种基于 VS 条目的解决方案;一种允许在线同步的因果解决方案和一种通过多尺度动态编程实现的全局解决方案。此外,我们将最近引入的 ECC 图像对齐算法扩展到时间维度,允许进行空间注册和同步细化,达到子帧精度。我们研究了短描述符的全搜索和量化方法,并将提出的方案与现有技术进行了比较。通过移动或静态摄像机进行的实际视频实验证明了所提出方法的效率,并验证了其在时空对准精度方面的有效性。