Yasir Siddiqui Muhammad, Ahn Hyunsik
Department of Mechanical System Engineering, Tongmyong University, Busan 48520, Republic of Korea.
School of Artificial Intelligence, Tongmyong University, Busan 48520, Republic of Korea.
Biomimetics (Basel). 2024 Dec 9;9(12):747. doi: 10.3390/biomimetics9120747.
Depth estimation plays a pivotal role in advancing human-robot interactions, especially in indoor environments where accurate 3D scene reconstruction is essential for tasks like navigation and object handling. Monocular depth estimation, which relies on a single RGB camera, offers a more affordable solution compared to traditional methods that use stereo cameras or LiDAR. However, despite recent progress, many monocular approaches struggle with accurately defining depth boundaries, leading to less precise reconstructions. In response to these challenges, this study introduces a novel depth estimation framework that leverages latent space features within a deep convolutional neural network to enhance the precision of monocular depth maps. The proposed model features dual encoder-decoder architecture, enabling both color-to-depth and depth-to-depth transformations. This structure allows for refined depth estimation through latent space encoding. To further improve the accuracy of depth boundaries and local features, a new loss function is introduced. This function combines latent loss with gradient loss, helping the model maintain the integrity of depth boundaries. The framework is thoroughly tested using the NYU Depth V2 dataset, where it sets a new benchmark, particularly excelling in complex indoor scenarios. The results clearly show that this approach effectively reduces depth ambiguities and blurring, making it a promising solution for applications in human-robot interaction and 3D scene reconstruction.
深度估计在推动人机交互方面起着关键作用,尤其是在室内环境中,准确的三维场景重建对于导航和物体操作等任务至关重要。单目深度估计依赖于单个RGB相机,与使用立体相机或激光雷达的传统方法相比,提供了一种更经济实惠的解决方案。然而,尽管最近取得了进展,但许多单目方法在准确定义深度边界方面仍存在困难,导致重建精度较低。针对这些挑战,本研究引入了一种新颖的深度估计框架,该框架利用深度卷积神经网络中的潜在空间特征来提高单目深度图的精度。所提出的模型具有双编码器-解码器架构,能够实现颜色到深度和深度到深度的转换。这种结构允许通过潜在空间编码进行精细的深度估计。为了进一步提高深度边界和局部特征的准确性,引入了一种新的损失函数。该函数将潜在损失与梯度损失相结合,帮助模型保持深度边界的完整性。该框架使用NYU Depth V2数据集进行了全面测试,在该数据集中它设定了一个新的基准,尤其在复杂的室内场景中表现出色。结果清楚地表明,这种方法有效地减少了深度模糊和模糊现象,使其成为人机交互和三维场景重建应用中有前景的解决方案。