IEEE Trans Pattern Anal Mach Intell. 2018 Dec;40(12):3059-3066. doi: 10.1109/TPAMI.2017.2772922. Epub 2017 Nov 13.
Three-dimensional shape reconstruction of 2D landmark points on a single image is a hallmark of human vision, but is a task that has been proven difficult for computer vision algorithms. We define a feed-forward deep neural network algorithm that can reconstruct 3D shapes from 2D landmark points almost perfectly (i.e., with extremely small reconstruction errors), even when these 2D landmarks are from a single image. Our experimental results show an improvement of up to two-fold over state-of-the-art computer vision algorithms; 3D shape reconstruction error (measured as the Procrustes distance between the reconstructed shape and the ground-truth) of human faces is , cars is .0022, human bodies is .022, and highly-deformable flags is .0004. Our algorithm was also a top performer at the 2016 3D Face Alignment in the Wild Challenge competition (done in conjunction with the European Conference on Computer Vision, ECCV) that required the reconstruction of 3D face shape from a single image. The derived algorithm can be trained in a couple hours and testing runs at more than 1,000 frames/s on an i7 desktop. We also present an innovative data augmentation approach that allows us to train the system efficiently with small number of samples. And the system is robust to noise (e.g., imprecise landmark points) and missing data (e.g., occluded or undetected landmark points).
从单张图像上的二维特征点重建三维形状是人类视觉的标志性能力,但这一任务已被证明非常具有挑战性,即使对于计算机视觉算法来说也是如此。我们定义了一种前馈式深度神经网络算法,可以近乎完美地(即,重建误差极小)从二维特征点重建三维形状,即使这些二维特征点仅来自单张图像。我们的实验结果表明,与最先进的计算机视觉算法相比,该算法的性能提高了一倍以上;人脸、汽车、人体和高度可变形标志的三维形状重建误差(以重建形状与真实形状之间的 Procrustes 距离衡量)分别为 、 、 和 。在与欧洲计算机视觉会议(ECCV)同期举行的 2016 年野外 3D 人脸配准挑战赛中,我们的算法也取得了优异成绩,该挑战赛要求仅从单张图像重建三维人脸形状。该算法可在数小时内训练完成,在 i7 台式机上的测试速度超过 1000 帧/秒。我们还提出了一种创新的数据增强方法,使我们能够使用少量样本高效地训练系统。此外,该系统对噪声(例如,不精确的特征点)和缺失数据(例如,遮挡或未检测到的特征点)具有鲁棒性。