Suppr超能文献

Dmg2Former-AR:用于高分辨率结构视觉检测的具有自适应重缩放功能的视觉Transformer

Dmg2Former-AR: Vision Transformers with Adaptive Rescaling for High-Resolution Structural Visual Inspection.

作者信息

Eltouny Kareem, Sajedi Seyedomid, Liang Xiao

机构信息

Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo, NY 14260, USA.

Structural Mechanics & Materials Division, Simpson Gumpertz & Heger, Waltham, MA 02451, USA.

出版信息

Sensors (Basel). 2024 Sep 17;24(18):6007. doi: 10.3390/s24186007.

Abstract

Developments in drones and imaging hardware technology have opened up countless possibilities for enhancing structural condition assessments and visual inspections. However, processing the inspection images requires considerable work hours, leading to delays in the assessment process. This study presents a semantic segmentation architecture that integrates vision transformers with Laplacian pyramid scaling networks, enabling rapid and accurate pixel-level damage detection. Unlike conventional methods that often lose critical details through resampling or cropping high-resolution images, our approach preserves essential inspection-related information such as microcracks and edges using non-uniform image rescaling networks. This innovation allows for detailed damage identification of high-resolution images while significantly reducing the computational demands. Our main contributions in this study are: (1) proposing two rescaling networks that together allow for processing high-resolution images while significantly reducing the computational demands; and (2) proposing Dmg2Former, a low-resolution segmentation network with a Swin Transformer backbone that leverages the saved computational resources to produce detailed visual inspection masks. We validate our method through a series of experiments on publicly available visual inspection datasets, addressing various tasks such as crack detection and material identification. Finally, we examine the computational efficiency of the adaptive rescalers in terms of multiply-accumulate operations and GPU-memory requirements.

摘要

无人机和成像硬件技术的发展为加强结构状况评估和目视检查带来了无数可能性。然而,处理检查图像需要耗费大量工时,导致评估过程延迟。本研究提出了一种语义分割架构,该架构将视觉变换器与拉普拉斯金字塔缩放网络集成在一起,能够实现快速且准确的像素级损伤检测。与传统方法不同,传统方法常常通过对高分辨率图像进行重采样或裁剪而丢失关键细节,我们的方法使用非均匀图像缩放网络保留了与检查相关的重要信息,如微裂纹和边缘。这一创新使得能够对高分辨率图像进行详细的损伤识别,同时显著降低计算需求。我们在本研究中的主要贡献包括:(1)提出了两个缩放网络,它们共同使得能够处理高分辨率图像,同时显著降低计算需求;(2)提出了Dmg2Former,这是一个具有Swin Transformer主干的低分辨率分割网络,它利用节省的计算资源生成详细的目视检查掩码。我们通过在公开可用的目视检查数据集上进行一系列实验来验证我们的方法,这些实验涉及诸如裂纹检测和材料识别等各种任务。最后,我们从乘法累加运算和GPU内存需求方面考察了自适应缩放器的计算效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cdd/11436102/51139333f361/sensors-24-06007-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验