IEEE Trans Pattern Anal Mach Intell. 2020 Mar;42(3):622-635. doi: 10.1109/TPAMI.2018.2883327. Epub 2018 Nov 26.
A first-person video delivers what the camera wearer (actor) experiences through physical interactions with surroundings. In this paper, we focus on a problem of Force from Motion-estimating the active force and torque exerted by the actor to drive her/his activity-from a first-person video. We use two physical cues inherited in the first-person video. (1) Ego-motion: the camera motion is generated by a resultant of force interactions, which allows us to understand the effect of the active force using Newtonian mechanics. (2) Visual semantics: the first-person visual scene is deployed to afford the actor's activity, which is indicative of the physical context of the activity. We estimate the active force and torque using a dynamical system that can describe the transition (dynamics) of the actor's physical state (position, orientation, and linear/angular momentum) where the latent physical state is indirectly observed by the first-person video. We approximate the physical state with the 3D camera trajectory that is reconstructed up to scale and orientation. The absolute scale factor and gravitation field are learned from the ego-motion and visual semantics of the first-person video. Inspired by an optimal control theory, we solve the dynamical system by minimizing reprojection error. Our method shows quantitatively equivalent reconstruction comparing to IMU measurements in terms of gravity and scale recovery and outperforms the methods based on 2D optical flow for an active action recognition task. We apply our method to first-person videos of mountain biking, urban bike racing, skiing, speedflying with parachute, and wingsuit flying where inertial measurements are not accessible.
第一人称视频传达了摄像机佩戴者(演员)通过与周围环境的物理交互所体验到的内容。在本文中,我们专注于从第一人称视频中估计演员施加的力和扭矩的力从运动估计问题。我们使用第一人称视频中继承的两个物理线索。(1) 自身运动:摄像机运动是由力相互作用的合力产生的,这使我们能够使用牛顿力学理解主动力的效果。(2) 视觉语义:第一人称视觉场景用于提供演员的活动,这表明了活动的物理背景。我们使用可以描述演员物理状态(位置、方向和线性/角动量)过渡(动力学)的动力系统来估计主动力和扭矩,其中潜在的物理状态通过第一人称视频间接观察。我们使用重建到比例和方向的 3D 摄像机轨迹来近似物理状态。绝对比例因子和重力场是从第一人称视频的自身运动和视觉语义中学习的。受最优控制理论的启发,我们通过最小化重投影误差来求解动力系统。与基于 IMU 测量的方法相比,我们的方法在重力和比例恢复方面具有定量等效的重建效果,并且在主动动作识别任务中优于基于 2D 光流的方法。我们将我们的方法应用于山地自行车、城市自行车赛车、滑雪、带降落伞的速度飞行和翼装飞行的第一人称视频中,在这些视频中无法访问惯性测量。