IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5335-5348. doi: 10.1109/TPAMI.2021.3067359. Epub 2022 Aug 4.
Imitation learning has recently been applied to mimic the operation of a cameraman in existing autonomous camera systems. To imitate a certain demonstration video, existing methods require users to collect a significant number of training videos with a similar filming style. Because the trained model is style-specific, it is challenging to generalize the model to imitate other videos with a different filming style. To address this problem, we propose a framework that we term "one-shot imitation filming", which can imitate a filming style by "seeing" only a single demonstration video of the target style without style-specific model training. This is achieved by two key enabling techniques: 1) filming style feature extraction, which encodes sequential cinematic characteristics of a variable-length video clip into a fixed-length feature vector; and 2) camera motion prediction, which dynamically plans the camera trajectory to reproduce the filming style of the demo video. We implemented the approach with a deep neural network and deployed it on a 6 degrees of freedom (DOF) drone system by first predicting the future camera motions, and then converting them into the drone's control commands via an odometer. Our experimental results on comprehensive datasets and showcases exhibit that the proposed approach achieves significant improvements over conventional baselines, and our approach can mimic the footage of an unseen style with high fidelity.
模仿学习最近被应用于模仿现有的自主摄像系统中的摄影师操作。为了模仿特定的演示视频,现有方法要求用户收集大量具有相似拍摄风格的训练视频。由于训练的模型是特定风格的,因此很难将模型推广到模仿其他具有不同拍摄风格的视频。为了解决这个问题,我们提出了一个名为“单次模仿拍摄”的框架,它可以通过“只看”单个目标风格的演示视频,而无需特定风格的模型训练,来模仿拍摄风格。这是通过两个关键的使能技术实现的:1)拍摄风格特征提取,它将可变长度视频剪辑的连续电影特征编码为固定长度的特征向量;2)相机运动预测,它动态规划相机轨迹以再现演示视频的拍摄风格。我们使用深度神经网络实现了该方法,并通过首先预测未来的相机运动,然后通过里程计将它们转换为无人机的控制命令,将其部署在具有 6 自由度(DOF)的无人机系统上。我们在综合数据集和展示中进行的实验结果表明,所提出的方法在传统基线方法上取得了显著的改进,并且我们的方法可以高度逼真地模仿看不见的风格的镜头。