Zhong Zhiwei, Liu Xianming, Jiang Junjun, Zhao Debin, Chen Zhiwen, Ji Xiangyang
IEEE Trans Image Process. 2022;31:648-663. doi: 10.1109/TIP.2021.3131041. Epub 2021 Dec 28.
Depth map records distance between the viewpoint and objects in the scene, which plays a critical role in many real-world applications. However, depth map captured by consumer-grade RGB-D cameras suffers from low spatial resolution. Guided depth map super-resolution (DSR) is a popular approach to address this problem, which attempts to restore a high-resolution (HR) depth map from the input low-resolution (LR) depth and its coupled HR RGB image that serves as the guidance. The most challenging issue for guided DSR is how to correctly select consistent structures and propagate them, and properly handle inconsistent ones. In this paper, we propose a novel attention-based hierarchical multi-modal fusion (AHMF) network for guided DSR. Specifically, to effectively extract and combine relevant information from LR depth and HR guidance, we propose a multi-modal attention based fusion (MMAF) strategy for hierarchical convolutional layers, including a feature enhancement block to select valuable features and a feature recalibration block to unify the similarity metrics of modalities with different appearance characteristics. Furthermore, we propose a bi-directional hierarchical feature collaboration (BHFC) module to fully leverage low-level spatial information and high-level structure information among multi-scale features. Experimental results show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
深度图记录场景中视点与物体之间的距离,这在许多实际应用中起着关键作用。然而,消费级RGB-D相机捕获的深度图存在空间分辨率低的问题。引导式深度图超分辨率(DSR)是解决这一问题的一种流行方法,它试图从输入的低分辨率(LR)深度图及其作为引导的耦合高分辨率(HR)RGB图像中恢复高分辨率(HR)深度图。引导式DSR最具挑战性的问题是如何正确选择一致的结构并进行传播,以及如何妥善处理不一致的结构。在本文中,我们提出了一种用于引导式DSR的基于注意力的新型分层多模态融合(AHMF)网络。具体而言,为了有效地从LR深度图和HR引导中提取并组合相关信息,我们针对分层卷积层提出了一种基于多模态注意力的融合(MMAF)策略,包括一个用于选择有价值特征的特征增强块和一个用于统一具有不同外观特征的模态相似性度量的特征重新校准块。此外,我们提出了一个双向分层特征协作(BHFC)模块,以充分利用多尺度特征之间的低级空间信息和高级结构信息。实验结果表明,我们的方法在重建精度、运行速度和内存效率方面优于现有方法。