• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用融合改进型变压器的轻量级单目深度估计

Lightweight monocular depth estimation using a fusion-improved transformer.

作者信息

Sui Xin, Gao Song, Xu Aigong, Zhang Cong, Wang Changqiang, Shi Zhengxu

机构信息

School of Geomatics, Liaoning Technical University, Fuxin, 123000, China.

出版信息

Sci Rep. 2024 Sep 28;14(1):22472. doi: 10.1038/s41598-024-72682-8.

DOI:10.1038/s41598-024-72682-8
PMID:39341820
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11438899/
Abstract

The existing deep estimation networks often overlook the issue of computational efficiency while pursuing high accuracy. This paper proposes a lightweight self-supervised network that combines convolutional neural networks (CNN) and Transformers as the feature extraction and encoding layers for images, enabling the network to capture both local geometric and global semantic features for depth estimation. First, depth-separable convolution is used to construct a dilated convolution residual module based on a shallow network to improve the shallow CNN feature extraction receptive field. In the transformer, a multidepth separable convolution head transposed attention module is proposed to reduce the computational burden of spatial self-attention. In the feedforward network, a two-step gating mechanism is proposed to improve the nonlinear representation ability of the feedforward network. Finally, the CNN and transformer are integrated to implement a depth estimation network with a local-global context interaction function. Compared with other lightweight models, this model has fewer model parameters and higher estimation accuracy. It also has better generalizability for different outdoor datasets. Additionally, the inference speed can reach 87 FPS, achieving better real-time performance and accounting for both inference speed and estimation accuracy.

摘要

现有的深度估计网络在追求高精度时往往忽略了计算效率问题。本文提出了一种轻量级自监督网络,该网络将卷积神经网络(CNN)和Transformer作为图像的特征提取和编码层,使网络能够捕捉用于深度估计的局部几何特征和全局语义特征。首先,使用深度可分离卷积基于浅层网络构建扩张卷积残差模块,以改善浅层CNN特征提取感受野。在Transformer中,提出了一个多深度可分离卷积头转置注意力模块,以减轻空间自注意力的计算负担。在前馈网络中,提出了一种两步门控机制,以提高前馈网络的非线性表示能力。最后,将CNN和Transformer集成以实现具有局部-全局上下文交互功能的深度估计网络。与其他轻量级模型相比,该模型具有更少的模型参数和更高的估计精度。它对不同的室外数据集也具有更好的通用性。此外,推理速度可达87帧每秒,实现了更好的实时性能,兼顾了推理速度和估计精度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/b078c43735d7/41598_2024_72682_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/5de0925dbd7d/41598_2024_72682_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/31318f0056e3/41598_2024_72682_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/f65f312816e3/41598_2024_72682_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/d960a95bd012/41598_2024_72682_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/c4cc06be2901/41598_2024_72682_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/8df91f746a7a/41598_2024_72682_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/b078c43735d7/41598_2024_72682_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/5de0925dbd7d/41598_2024_72682_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/31318f0056e3/41598_2024_72682_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/f65f312816e3/41598_2024_72682_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/d960a95bd012/41598_2024_72682_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/c4cc06be2901/41598_2024_72682_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/8df91f746a7a/41598_2024_72682_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac59/11438899/b078c43735d7/41598_2024_72682_Fig7_HTML.jpg

相似文献

1
Lightweight monocular depth estimation using a fusion-improved transformer.使用融合改进型变压器的轻量级单目深度估计
Sci Rep. 2024 Sep 28;14(1):22472. doi: 10.1038/s41598-024-72682-8.
2
RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.RT-ViT:基于轻量级视觉Transformer 的实时单目深度估计。
Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.
3
Self-Supervised Lightweight Depth Estimation in Endoscopy Combining CNN and Transformer.结合卷积神经网络和变压器的内窥镜自监督轻量级深度估计
IEEE Trans Med Imaging. 2024 May;43(5):1934-1944. doi: 10.1109/TMI.2024.3352390. Epub 2024 May 2.
4
Monocular Depth Estimation: Lightweight Convolutional and Matrix Capsule Feature-Fusion Network.单目深度估计:轻量级卷积和矩阵胶囊特征融合网络。
Sensors (Basel). 2022 Aug 23;22(17):6344. doi: 10.3390/s22176344.
5
Depth Estimation from Light Field Geometry Using Convolutional Neural Networks.基于卷积神经网络的光场几何深度估计
Sensors (Basel). 2021 Sep 10;21(18):6061. doi: 10.3390/s21186061.
6
[A lightweight recurrence prediction model for high grade serous ovarian cancer based on hierarchical transformer fusion metadata].基于分层变压器融合元数据的高级别浆液性卵巢癌轻量级复发预测模型
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Aug 25;41(4):807-817. doi: 10.7507/1001-5515.202308009.
7
MMViT-Seg: A lightweight transformer and CNN fusion network for COVID-19 segmentation.MMViT-Seg:一种用于 COVID-19 分割的轻量级Transformer 和 CNN 融合网络。
Comput Methods Programs Biomed. 2023 Mar;230:107348. doi: 10.1016/j.cmpb.2023.107348. Epub 2023 Jan 12.
8
TransConver: transformer and convolution parallel network for developing automatic brain tumor segmentation in MRI images.TransConver:用于在MRI图像中开发自动脑肿瘤分割的变压器与卷积并行网络。
Quant Imaging Med Surg. 2022 Apr;12(4):2397-2415. doi: 10.21037/qims-21-919.
9
CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking.CVTrack:用于视觉跟踪的卷积神经网络与视觉Transformer融合模型
Sensors (Basel). 2024 Jan 3;24(1):274. doi: 10.3390/s24010274.
10
Enhancing skin lesion segmentation with a fusion of convolutional neural networks and transformer models.通过融合卷积神经网络和Transformer模型增强皮肤病变分割
Heliyon. 2024 May 17;10(10):e31395. doi: 10.1016/j.heliyon.2024.e31395. eCollection 2024 May 30.

本文引用的文献

1
Make3D: learning 3D scene structure from a single still image.Make3D:从单张静止图像学习3D场景结构。
IEEE Trans Pattern Anal Mach Intell. 2009 May;31(5):824-40. doi: 10.1109/TPAMI.2008.132.