Huang Ying, Fang Lin, Hu Shanfeng
Institute of Virtual Reality and Intelligent Systems, Hangzhou Normal University, Hangzhou 311121, China.
Department of Computer and Information Sciences, Northumbria University, Newcastle-upon-Tyne NE1 8ST, UK.
Sensors (Basel). 2023 Jul 19;23(14):6525. doi: 10.3390/s23146525.
We present , a new method for recovering high-fidelity 3D facial geometry and appearance with enhanced textures from single-view images. While vision-based face reconstruction has received intensive research in the past decades due to its broad applications, it remains a challenging problem because human eyes are particularly sensitive to numerically minute yet perceptually significant details. Previous methods that seek to minimize reconstruction errors within a low-dimensional face space can suffer from this issue and generate close yet low-fidelity approximations. The loss of high-frequency texture details is a key factor in their process, which we propose to address by learning to recover both dense radiance residuals and sparse facial texture features from a single image, in addition to the variables solved by previous work-shape, appearance, illumination, and camera. We integrate the estimation of all these factors in a single unified deep neural network and train it on several popular face reconstruction datasets. We also introduce two new metrics, visual fidelity (VIF) and structural similarity (SSIM), to compensate for the fact that reconstruction error is not a consistent perceptual metric of quality. On the popular FaceWarehouse facial reconstruction benchmark, our proposed system achieves a VIF score of 0.4802 and an SSIM score of 0.9622, improving over the state-of-the-art Deep3D method by 6.69% and 0.86%, respectively. On the widely used LS3D-300W dataset, we obtain a VIF score of 0.3922 and an SSIM score of 0.9079 for indoor images, and the scores for outdoor images are 0.4100 and 0.9160, respectively, which also represent an improvement over those of Deep3D. These results show that our method is able to recover visually more realistic facial appearance details compared with previous methods.
我们提出了一种从单视图图像中恢复具有增强纹理的高保真3D面部几何形状和外观的新方法。尽管基于视觉的面部重建由于其广泛的应用在过去几十年中受到了深入研究,但它仍然是一个具有挑战性的问题,因为人眼对数值上微小但在感知上很重要的细节特别敏感。以前试图在低维面部空间内最小化重建误差的方法可能会受到这个问题的影响,并生成相近但低保真度的近似结果。高频纹理细节的丢失是它们处理过程中的一个关键因素,我们建议通过学习从单张图像中恢复密集的辐射残差和稀疏的面部纹理特征来解决这个问题,此外还要解决先前工作中的变量——形状、外观、光照和相机。我们将所有这些因素的估计集成到一个统一的深度神经网络中,并在几个流行的面部重建数据集上对其进行训练。我们还引入了两个新的指标,视觉保真度(VIF)和结构相似性(SSIM),以弥补重建误差不是质量的一致感知指标这一事实。在流行的FaceWarehouse面部重建基准测试中,我们提出的系统实现了0.4802的VIF分数和0.9622的SSIM分数,分别比最先进的Deep3D方法提高了6.69%和0.86%。在广泛使用的LS3D - 300W数据集上,我们对室内图像获得了0.3922的VIF分数和0.9079的SSIM分数,室外图像的分数分别为0.4100和0.9160,这也代表了相对于Deep3D的改进。这些结果表明,与以前的方法相比,我们的方法能够恢复视觉上更逼真的面部外观细节。