具有门控控制机制和多尺度融合的卷积神经网络-视觉Transformer架构用于增强型肺部疾病分类

Convolutional Neural Network-Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification.

作者信息

Chibuike Okpala, Yang Xiaopeng

机构信息

Department of Human Ecology & Technology, Handong Global University, Pohang 37554, Republic of Korea.

School of Global Entrepreneurship and Information Communication Technology, Handong Global University, Pohang 37554, Republic of Korea.

出版信息

Diagnostics (Basel). 2024 Dec 12;14(24):2790. doi: 10.3390/diagnostics14242790.

DOI:10.3390/diagnostics14242790

PMID:39767151

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11727035/

Abstract

BACKGROUND/OBJECTIVES: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions.

METHODS

In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation.

RESULTS

Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions.

CONCLUSIONS

Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging.

摘要

背景/目的：视觉Transformer（ViT）和卷积神经网络（CNN）在图像分类中表现出色，尤其是在医学影像分析领域。然而，ViT难以捕捉图像的高频成分，而高频成分对于识别细粒度模式至关重要，而CNN由于其局部感受野，在捕捉长程依赖方面存在困难，这使得难以充分捕捉肺部区域之间的空间关系。

方法

在本文中，我们提出了一种混合架构，在模块化组件块中集成ViT和CNN，以利用局部特征提取和全局上下文捕捉。在每个组件块中，CNN用于提取局部特征，然后将其传递给ViT以捕捉全局依赖。我们实现了一种门控注意力机制，该机制结合通道、空间和逐元素注意力，有选择地强调重要特征，从而增强整体特征表示。此外，我们在所提出的框架中纳入了多尺度融合模块（MSFM），以融合不同尺度的特征，实现更全面的特征表示。

结果

我们提出的模型在四种肺部疾病的分类中达到了99.50%的准确率。

结论

通过广泛的实验和消融研究，我们证明了我们的方法在提高医学图像分类性能方面的有效性，同时取得了良好的校准结果。这种混合方法为医学影像中可靠准确的疾病诊断提供了一个有前景的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/37293bac0a12/diagnostics-14-02790-g001.jpg

相似文献

Convolutional Neural Network-Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification.具有门控控制机制和多尺度融合的卷积神经网络-视觉Transformer架构用于增强型肺部疾病分类

Diagnostics (Basel). 2024 Dec 12;14(24):2790. doi: 10.3390/diagnostics14242790.

Enhanced Pneumonia Detection in Chest X-Rays Using Hybrid Convolutional and Vision Transformer Networks.使用混合卷积和视觉Transformer网络增强胸部X光片中的肺炎检测

Curr Med Imaging. 2025;21:e15734056326685. doi: 10.2174/0115734056326685250101113959.

HTC-retina: A hybrid retinal diseases classification model using transformer-Convolutional Neural Network from optical coherence tomography images.HTC-retina：一种使用来自光学相干断层扫描图像的变压器-卷积神经网络的混合视网膜疾病分类模型。

Comput Biol Med. 2024 Aug;178:108726. doi: 10.1016/j.compbiomed.2024.108726. Epub 2024 Jun 9.

An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition.一种基于可解释卷积神经网络和视觉Transformer的实时食品识别方法。

Nutrients. 2025 Jan 20;17(2):362. doi: 10.3390/nu17020362.

Transformer guided self-adaptive network for multi-scale skin lesion image segmentation.Transformer 引导的自适网络用于多尺度皮肤病变图像分割。

Comput Biol Med. 2024 Feb;169:107846. doi: 10.1016/j.compbiomed.2023.107846. Epub 2023 Dec 23.

Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images.推进乳腺癌诊断：用于组织病理学图像更快、更准确分类的令牌视觉变换器

Vis Comput Ind Biomed Art. 2025 Jan 8;8(1):1. doi: 10.1186/s42492-024-00181-8.

A spatial-spectral fusion convolutional transformer network with contextual multi-head self-attention for hyperspectral image classification.一种用于高光谱图像分类的具有上下文多头自注意力机制的空间-光谱融合卷积变压器网络。

Neural Netw. 2025 Jul;187:107350. doi: 10.1016/j.neunet.2025.107350. Epub 2025 Mar 14.

Enhanced breast mass segmentation in mammograms using a hybrid transformer UNet model.使用混合变压器UNet模型增强乳腺钼靶图像中的乳腺肿块分割

Comput Biol Med. 2025 Jan;184:109432. doi: 10.1016/j.compbiomed.2024.109432. Epub 2024 Nov 19.

VcaNet: Vision Transformer with fusion channel and spatial attention module for 3D brain tumor segmentation.VcaNet：用于3D脑肿瘤分割的具有融合通道和空间注意力模块的视觉Transformer

Comput Biol Med. 2025 Mar;186:109662. doi: 10.1016/j.compbiomed.2025.109662. Epub 2025 Jan 14.

MSA-MaxNet: Multi-Scale Attention Enhanced Multi-Axis Vision Transformer Network for Medical Image Segmentation.MSA-MaxNet：用于医学图像分割的多尺度注意力增强多轴视觉Transformer网络

J Cell Mol Med. 2024 Dec;28(24):e70315. doi: 10.1111/jcmm.70315.

引用本文的文献

A Comparative Analysis of the Mamba, Transformer, and CNN Architectures for Multi-Label Chest X-Ray Anomaly Detection in the NIH ChestX-Ray14 Dataset.在NIH ChestX-Ray14数据集中用于多标签胸部X光异常检测的曼巴、Transformer和卷积神经网络（CNN）架构的比较分析

Diagnostics (Basel). 2025 Sep 1;15(17):2215. doi: 10.3390/diagnostics15172215.

Hybrid deep learning optimization for smart agriculture: Dipper throated optimization and polar rose search applied to water quality prediction.用于智能农业的混合深度学习优化：应用于水质预测的北斗咽喉优化和极地玫瑰搜索

PLoS One. 2025 Jul 21;20(7):e0327230. doi: 10.1371/journal.pone.0327230. eCollection 2025.

Optimizing the early diagnosis of neurological disorders through the application of machine learning for predictive analytics in medical imaging.通过应用机器学习进行医学成像预测分析来优化神经系统疾病的早期诊断。

Sci Rep. 2025 Jul 2;15(1):22488. doi: 10.1038/s41598-025-05888-z.

本文引用的文献

A hybrid approach of vision transformers and CNNs for detection of ulcerative colitis.基于视觉Transformer 和 CNN 的混合方法用于溃疡性结肠炎检测。

Sci Rep. 2024 Oct 21;14(1):24771. doi: 10.1038/s41598-024-75901-4.

A deep convolutional neural network approach using medical image classification.基于医学图像分类的深度卷积神经网络方法。

BMC Med Inform Decis Mak. 2024 Aug 29;24(1):239. doi: 10.1186/s12911-024-02646-5.

Palmprint recognition based on gating mechanism and adaptive feature fusion.基于门控机制和自适应特征融合的掌纹识别

Front Neurorobot. 2023 May 26;17:1203962. doi: 10.3389/fnbot.2023.1203962. eCollection 2023.

A COVID-19 medical image classification algorithm based on Transformer.基于 Transformer 的 COVID-19 医学图像分类算法。

Sci Rep. 2023 Apr 1;13(1):5359. doi: 10.1038/s41598-023-32462-2.

COVID-19 pneumonia: lessons learned, challenges, and preparing for the future.新型冠状病毒肺炎：经验教训、挑战和未来准备。

Diagn Interv Radiol. 2022 Nov;28(6):576-585. doi: 10.5152/dir.2022.221881.

Ensemble Technique Coupled with Deep Transfer Learning Framework for Automatic Detection of Tuberculosis from Chest X-ray Radiographs.结合集成技术与深度迁移学习框架用于从胸部X光片中自动检测肺结核

Healthcare (Basel). 2022 Nov 21;10(11):2335. doi: 10.3390/healthcare10112335.

Automated Lung-Related Pneumonia and COVID-19 Detection Based on Novel Feature Extraction Framework and Vision Transformer Approaches Using Chest X-ray Images.基于新型特征提取框架和视觉Transformer方法的胸部X光图像自动肺部相关肺炎和新冠肺炎检测

Bioengineering (Basel). 2022 Nov 18;9(11):709. doi: 10.3390/bioengineering9110709.

CCT: Lightweight compact convolutional transformer for lung disease CT image classification.CCT：用于肺病CT图像分类的轻量级紧凑型卷积变压器

Front Physiol. 2022 Nov 4;13:1066999. doi: 10.3389/fphys.2022.1066999. eCollection 2022.

AI-Assisted Tuberculosis Detection and Classification from Chest X-Rays Using a Deep Learning Normalization-Free Network Model.基于深度学习无归一化网络模型的 AI 辅助 chest X-ray 结核检测与分类

Comput Intell Neurosci. 2022 Oct 3;2022:2399428. doi: 10.1155/2022/2399428. eCollection 2022.

Early Diagnosis of Tuberculosis Using Deep Learning Approach for IOT Based Healthcare Applications.基于物联网的医疗保健应用的深度学习方法进行结核病的早期诊断。

Comput Intell Neurosci. 2022 Sep 28;2022:3357508. doi: 10.1155/2022/3357508. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

具有门控控制机制和多尺度融合的卷积神经网络-视觉Transformer架构用于增强型肺部疾病分类

Convolutional Neural Network-Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification.

作者信息

机构信息

出版信息

METHODS

RESULTS

CONCLUSIONS

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献