Suppr超能文献

通过细节语义协作网络进行室内场景的单目深度估计

Monocular depth estimation via a detail semantic collaborative network for indoor scenes.

作者信息

Song Wen, Cui Xu, Xie Yakun, Wang Guohua, Ma Jiexi

机构信息

School of Architecture, Southwest Jiaotong University, Chengdu, 611756, China.

Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu, 611756, China.

出版信息

Sci Rep. 2025 Mar 31;15(1):10990. doi: 10.1038/s41598-025-96024-4.

Abstract

Monocular image depth estimation is crucial for indoor scene reconstruction, and it plays a significant role in optimizing building energy efficiency, indoor environment modeling, and smart space design. However, the small depth variability of indoor scenes leads to weakly distinguishable detail features. Meanwhile, there are diverse types of indoor objects, and the expression of the correlation among different objects is complicated. Additionally, the robustness of recent models still needs further improvement given these indoor environments. To address these problems, a detail‒semantic collaborative network (DSCNet) is proposed for monocular depth estimation of indoor scenes. First, the contextual features contained in the images are fully captured via the hierarchical transformer structure. Second, a detail‒semantic collaborative structure is established, which establishes a selective attention feature map to extract details and semantic information from feature maps. The extracted features are subsequently fused to improve the perception ability of the network. Finally, the complex correlation among indoor objects is addressed by aggregating semantic and detailed features at different levels, and the model accuracy is effectively improved without increasing the number of parameters. The proposed model is tested on the NYU and SUN datasets. The proposed approach produces state-of-the-art results compared with the 14 performance results of recent optimal methods. In addition, the proposed approach is fully discussed and analyzed in terms of stability, robustness, ablation experiments and availability in indoor scenes.

摘要

单目图像深度估计对于室内场景重建至关重要,并且在优化建筑能源效率、室内环境建模和智能空间设计中发挥着重要作用。然而,室内场景的深度变化较小导致细节特征难以区分。同时,室内物体类型多样,不同物体之间相关性的表达较为复杂。此外,鉴于这些室内环境,近期模型的鲁棒性仍需进一步提高。为了解决这些问题,提出了一种用于室内场景单目深度估计的细节-语义协作网络(DSCNet)。首先,通过分层变压器结构充分捕捉图像中包含的上下文特征。其次,建立了一种细节-语义协作结构,该结构建立了一个选择性注意力特征图,以从特征图中提取细节和语义信息。随后将提取的特征进行融合,以提高网络的感知能力。最后,通过在不同级别聚合语义和详细特征来处理室内物体之间的复杂相关性,并且在不增加参数数量的情况下有效提高了模型精度。所提出的模型在NYU和SUN数据集上进行了测试。与近期最优方法的14个性能结果相比,所提出的方法产生了最优结果。此外,从稳定性、鲁棒性、消融实验和在室内场景中的可用性等方面对所提出的方法进行了充分的讨论和分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6aba/11958687/9b406ffb8d13/41598_2025_96024_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验