基于深度卷积神经网络和快速修复算法的二维到三维立体视觉图像自适应转换

Adaptable 2D to 3D Stereo Vision Image Conversion Based on a Deep Convolutional Neural Network and Fast Inpaint Algorithm.

作者信息

Hachaj Tomasz

机构信息

Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, AGH University of Krakow, Al. Mickiewicza 30, 30-059 Krakow, Poland.

出版信息

Entropy (Basel). 2023 Aug 15;25(8):1212. doi: 10.3390/e25081212.

DOI:10.3390/e25081212

PMID:37628242

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10453122/

Abstract

Algorithms for converting 2D to 3D are gaining importance following the hiatus brought about by the discontinuation of 3D TV production; this is due to the high availability and popularity of virtual reality systems that use stereo vision. In this paper, several depth image-based rendering (DIBR) approaches using state-of-the-art single-frame depth generation neural networks and inpaint algorithms are proposed and validated, including a novel very fast inpaint (FAST). FAST significantly exceeds the speed of currently used inpaint algorithms by reducing computational complexity, without degrading the quality of the resulting image. The role of the inpaint algorithm is to fill in missing pixels in the stereo pair estimated by DIBR. Missing estimated pixels appear at the boundaries of areas that differ significantly in their estimated distance from the observer. In addition, we propose parameterizing DIBR using a singular, easy-to-interpret adaptable parameter that can be adjusted online according to the preferences of the user who views the visualization. This single parameter governs both the camera parameters and the maximum binocular disparity. The proposed solutions are also compared with a fully automatic 2D to 3D mapping solution. The algorithm proposed in this work, which features intuitive disparity steering, the foundational deep neural network MiDaS, and the FAST inpaint algorithm, received considerable acclaim from evaluators. The mean absolute error of the proposed solution does not contain statistically significant differences from state-of-the-art approaches like Deep3D and other DIBR-based approaches using different inpaint functions. Since both the source codes and the generated videos are available for download, all experiments can be reproduced, and one can apply our algorithm to any selected video or single image to convert it.

摘要

随着3D电视制作的停止所带来的中断，将二维转换为三维的算法正变得越来越重要；这是由于使用立体视觉的虚拟现实系统具有高可用性和广泛的普及性。本文提出并验证了几种基于深度图像渲染（DIBR）的方法，这些方法使用了最先进的单帧深度生成神经网络和修复算法，包括一种新颖的超快速修复算法（FAST）。FAST通过降低计算复杂度，显著超过了当前使用的修复算法的速度，同时又不降低生成图像的质量。修复算法的作用是填充由DIBR估计的立体图像对中的缺失像素。缺失的估计像素出现在与观察者估计距离差异显著的区域边界。此外，我们提出使用一个单一的、易于解释的可适应参数对DIBR进行参数化，该参数可以根据观看可视化的用户偏好进行在线调整。这个单一参数同时控制相机参数和最大双目视差。还将所提出的解决方案与一种全自动的二维到三维映射解决方案进行了比较。这项工作中提出的算法，具有直观的视差控制、基础深度神经网络MiDaS和FAST修复算法，受到了评估人员的广泛好评。所提出解决方案的平均绝对误差与像Deep3D等最先进的方法以及其他使用不同修复函数的基于DIBR的方法相比，不存在统计学上的显著差异。由于源代码和生成的视频都可供下载，所有实验都可以重现，并且人们可以将我们的算法应用于任何选定的视频或单幅图像进行转换。