IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5280-5292. doi: 10.1109/TPAMI.2021.3075676. Epub 2022 Aug 4.
Contextual information plays an important role in solving various image and scene understanding tasks. Prior works have focused on the extraction of contextual information from an image and use it to infer the properties of some object(s) in the image or understand the scene behind the image, e.g., context-based object detection, recognition and semantic segmentation. In this paper, we consider an inverse problem, i.e., how to hallucinate the missing contextual information from the properties of standalone objects. We refer to it as object-level scene context prediction. This problem is difficult, as it requires extensive knowledge of the complex and diverse relationships among objects in the scene. We propose a deep neural network, which takes as input the properties (i.e., category, shape, and position) of a few standalone objects to predict an object-level scene layout that compactly encodes the semantics and structure of the scene context where the given objects are. Quantitative experiments and user studies demonstrate that our model can generate more plausible scene contexts than the baselines. Our model also enables the synthesis of realistic scene images from partial scene layouts. Finally, we validate that our model internally learns useful features for scene recognition and fake scene detection.
上下文信息在解决各种图像和场景理解任务中起着重要作用。先前的工作主要集中在从图像中提取上下文信息,并利用它来推断图像中某些物体的属性或理解图像背后的场景,例如基于上下文的目标检测、识别和语义分割。在本文中,我们考虑了一个逆向问题,即如何从孤立物体的属性中推测出缺失的上下文信息。我们将其称为对象级场景上下文预测。这个问题很困难,因为它需要广泛的知识来了解场景中物体之间复杂多样的关系。我们提出了一个深度神经网络,它将几个孤立物体的属性(即类别、形状和位置)作为输入,来预测一个紧凑地编码了给定物体所在场景上下文语义和结构的对象级场景布局。定量实验和用户研究表明,我们的模型可以生成比基线更合理的场景上下文。我们的模型还可以从部分场景布局中合成逼真的场景图像。最后,我们验证了我们的模型内部学习到了用于场景识别和虚假场景检测的有用特征。