用于地标定位的结构感知全卷积网络的对抗学习

Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization.

作者信息

Chen Yu, Shen Chunhua, Chen Hao, Wei Xiu-Shen, Liu Lingqiao, Yang Jian

出版信息

IEEE Trans Pattern Anal Mach Intell. 2020 Jul;42(7):1654-1669. doi: 10.1109/TPAMI.2019.2901875. Epub 2019 Feb 26.

DOI:10.1109/TPAMI.2019.2901875

Abstract

Landmark/pose estimation in single monocular images has received much effort in computer vision due to its important applications. It remains a challenging task when input images come with severe occlusions caused by, e.g., adverse camera views. Under such circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of landmark point inter-connectivity. To address the problem, by incorporating priors about the structure of pose components, we propose a novel structure-aware fully convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, inspired by how human identifies implausible poses, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator G generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors. Training of the network follows the strategy of conditional Generative Adversarial Networks (GANs). The effectiveness of the proposed network is evaluated on three pose-related tasks: 2D human pose estimation, 2D facial landmark estimation and 3D human pose estimation. The proposed approach significantly outperforms several state-of-the-art methods and almost always generates plausible pose predictions, demonstrating the usefulness of implicit learning of structures using GANs.

摘要

由于其重要应用，单目图像中的地标/姿态估计在计算机视觉领域受到了广泛关注。当输入图像存在严重遮挡（例如，不利的相机视角导致）时，这仍然是一项具有挑战性的任务。在这种情况下，可能会产生不符合生物学常理的姿态预测。相比之下，人类视觉能够通过利用地标点相互连接的几何约束来预测姿态。为了解决这个问题，通过纳入关于姿态组件结构的先验知识，我们提出了一种新颖的结构感知全卷积网络，以便在深度网络训练期间隐式地考虑这些先验知识。显式学习此类约束通常具有挑战性。相反，受人类如何识别不合理姿态的启发，我们设计了鉴别器来区分真实姿态和虚假姿态（例如不符合生物学常理的姿态）。如果姿态生成器G生成的结果鉴别器无法与真实结果区分开来，那么网络就成功地学习了先验知识。网络的训练遵循条件生成对抗网络（GAN）的策略。在三个与姿态相关的任务上评估了所提出网络的有效性：二维人体姿态估计、二维面部地标估计和三维人体姿态估计。所提出的方法显著优于几种最新方法，并且几乎总是生成合理的姿态预测，证明了使用GAN隐式学习结构的有用性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于地标定位的结构感知全卷积网络的对抗学习

Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization.

作者信息

出版信息

相似文献

引用本文的文献

用于地标定位的结构感知全卷积网络的对抗学习

Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization.

作者信息

出版信息

相似文献

引用本文的文献