IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2806-2826. doi: 10.1109/TPAMI.2020.3045007. Epub 2022 May 5.
The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demonstrated potential capabilities for extracting meaningful representations of the underlying patterns in natural videos. Motivated by the increasing interest in this task, we provide a review on the deep learning methods for prediction in video sequences. We first define the video prediction fundamentals, as well as mandatory background concepts and the most used datasets. Next, we carefully analyze existing video prediction models organized according to a proposed taxonomy, highlighting their contributions and their significance in the field. The summary of the datasets and methods is accompanied with experimental results that facilitate the assessment of the state of the art on a quantitative basis. The paper is summarized by drawing some general conclusions, identifying open research challenges and by pointing out future research directions.
预测、预见和推理未来结果的能力是智能决策系统的关键组成部分。鉴于深度学习在计算机视觉方面的成功,基于深度学习的视频预测成为一个有前途的研究方向。视频预测被定义为一种自监督学习任务,它为表示学习提供了一个合适的框架,因为它展示了从自然视频中提取底层模式有意义表示的潜力。鉴于人们对这项任务的兴趣日益浓厚,我们对视频序列预测的深度学习方法进行了综述。我们首先定义视频预测的基础知识,以及必要的背景概念和最常用的数据集。接下来,我们根据提出的分类法仔细分析现有的视频预测模型,突出它们的贡献及其在该领域的重要性。数据集和方法的总结附有实验结果,便于在定量基础上评估最新技术水平。本文通过总结一些一般性结论、确定开放的研究挑战并指出未来的研究方向来结束。