• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有门控控制机制和多尺度融合的卷积神经网络-视觉Transformer架构用于增强型肺部疾病分类

Convolutional Neural Network-Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification.

作者信息

Chibuike Okpala, Yang Xiaopeng

机构信息

Department of Human Ecology & Technology, Handong Global University, Pohang 37554, Republic of Korea.

School of Global Entrepreneurship and Information Communication Technology, Handong Global University, Pohang 37554, Republic of Korea.

出版信息

Diagnostics (Basel). 2024 Dec 12;14(24):2790. doi: 10.3390/diagnostics14242790.

DOI:10.3390/diagnostics14242790
PMID:39767151
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11727035/
Abstract

BACKGROUND/OBJECTIVES: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions.

METHODS

In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation.

RESULTS

Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions.

CONCLUSIONS

Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging.

摘要

背景/目的:视觉Transformer(ViT)和卷积神经网络(CNN)在图像分类中表现出色,尤其是在医学影像分析领域。然而,ViT难以捕捉图像的高频成分,而高频成分对于识别细粒度模式至关重要,而CNN由于其局部感受野,在捕捉长程依赖方面存在困难,这使得难以充分捕捉肺部区域之间的空间关系。

方法

在本文中,我们提出了一种混合架构,在模块化组件块中集成ViT和CNN,以利用局部特征提取和全局上下文捕捉。在每个组件块中,CNN用于提取局部特征,然后将其传递给ViT以捕捉全局依赖。我们实现了一种门控注意力机制,该机制结合通道、空间和逐元素注意力,有选择地强调重要特征,从而增强整体特征表示。此外,我们在所提出的框架中纳入了多尺度融合模块(MSFM),以融合不同尺度的特征,实现更全面的特征表示。

结果

我们提出的模型在四种肺部疾病的分类中达到了99.50%的准确率。

结论

通过广泛的实验和消融研究,我们证明了我们的方法在提高医学图像分类性能方面的有效性,同时取得了良好的校准结果。这种混合方法为医学影像中可靠准确的疾病诊断提供了一个有前景的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/a07daee42f95/diagnostics-14-02790-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/37293bac0a12/diagnostics-14-02790-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/9b6a52ce686d/diagnostics-14-02790-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/bcb49191597f/diagnostics-14-02790-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/1ce38272b1b5/diagnostics-14-02790-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/7c3a4ad68294/diagnostics-14-02790-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/a07daee42f95/diagnostics-14-02790-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/37293bac0a12/diagnostics-14-02790-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/9b6a52ce686d/diagnostics-14-02790-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/bcb49191597f/diagnostics-14-02790-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/1ce38272b1b5/diagnostics-14-02790-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/7c3a4ad68294/diagnostics-14-02790-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c136/11727035/a07daee42f95/diagnostics-14-02790-g006.jpg

相似文献

1
Convolutional Neural Network-Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification.具有门控控制机制和多尺度融合的卷积神经网络-视觉Transformer架构用于增强型肺部疾病分类
Diagnostics (Basel). 2024 Dec 12;14(24):2790. doi: 10.3390/diagnostics14242790.
2
Enhanced Pneumonia Detection in Chest X-Rays Using Hybrid Convolutional and Vision Transformer Networks.使用混合卷积和视觉Transformer网络增强胸部X光片中的肺炎检测
Curr Med Imaging. 2025;21:e15734056326685. doi: 10.2174/0115734056326685250101113959.
3
HTC-retina: A hybrid retinal diseases classification model using transformer-Convolutional Neural Network from optical coherence tomography images.HTC-retina:一种使用来自光学相干断层扫描图像的变压器-卷积神经网络的混合视网膜疾病分类模型。
Comput Biol Med. 2024 Aug;178:108726. doi: 10.1016/j.compbiomed.2024.108726. Epub 2024 Jun 9.
4
An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition.一种基于可解释卷积神经网络和视觉Transformer的实时食品识别方法。
Nutrients. 2025 Jan 20;17(2):362. doi: 10.3390/nu17020362.
5
Transformer guided self-adaptive network for multi-scale skin lesion image segmentation.Transformer 引导的自适网络用于多尺度皮肤病变图像分割。
Comput Biol Med. 2024 Feb;169:107846. doi: 10.1016/j.compbiomed.2023.107846. Epub 2023 Dec 23.
6
Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images.推进乳腺癌诊断:用于组织病理学图像更快、更准确分类的令牌视觉变换器
Vis Comput Ind Biomed Art. 2025 Jan 8;8(1):1. doi: 10.1186/s42492-024-00181-8.
7
A spatial-spectral fusion convolutional transformer network with contextual multi-head self-attention for hyperspectral image classification.一种用于高光谱图像分类的具有上下文多头自注意力机制的空间-光谱融合卷积变压器网络。
Neural Netw. 2025 Jul;187:107350. doi: 10.1016/j.neunet.2025.107350. Epub 2025 Mar 14.
8
Enhanced breast mass segmentation in mammograms using a hybrid transformer UNet model.使用混合变压器UNet模型增强乳腺钼靶图像中的乳腺肿块分割
Comput Biol Med. 2025 Jan;184:109432. doi: 10.1016/j.compbiomed.2024.109432. Epub 2024 Nov 19.
9
VcaNet: Vision Transformer with fusion channel and spatial attention module for 3D brain tumor segmentation.VcaNet:用于3D脑肿瘤分割的具有融合通道和空间注意力模块的视觉Transformer
Comput Biol Med. 2025 Mar;186:109662. doi: 10.1016/j.compbiomed.2025.109662. Epub 2025 Jan 14.
10
MSA-MaxNet: Multi-Scale Attention Enhanced Multi-Axis Vision Transformer Network for Medical Image Segmentation.MSA-MaxNet:用于医学图像分割的多尺度注意力增强多轴视觉Transformer网络
J Cell Mol Med. 2024 Dec;28(24):e70315. doi: 10.1111/jcmm.70315.

引用本文的文献

1
A Comparative Analysis of the Mamba, Transformer, and CNN Architectures for Multi-Label Chest X-Ray Anomaly Detection in the NIH ChestX-Ray14 Dataset.在NIH ChestX-Ray14数据集中用于多标签胸部X光异常检测的曼巴、Transformer和卷积神经网络(CNN)架构的比较分析
Diagnostics (Basel). 2025 Sep 1;15(17):2215. doi: 10.3390/diagnostics15172215.
2
Hybrid deep learning optimization for smart agriculture: Dipper throated optimization and polar rose search applied to water quality prediction.用于智能农业的混合深度学习优化:应用于水质预测的北斗咽喉优化和极地玫瑰搜索
PLoS One. 2025 Jul 21;20(7):e0327230. doi: 10.1371/journal.pone.0327230. eCollection 2025.
3

本文引用的文献

1
A hybrid approach of vision transformers and CNNs for detection of ulcerative colitis.基于视觉Transformer 和 CNN 的混合方法用于溃疡性结肠炎检测。
Sci Rep. 2024 Oct 21;14(1):24771. doi: 10.1038/s41598-024-75901-4.
2
A deep convolutional neural network approach using medical image classification.基于医学图像分类的深度卷积神经网络方法。
BMC Med Inform Decis Mak. 2024 Aug 29;24(1):239. doi: 10.1186/s12911-024-02646-5.
3
Palmprint recognition based on gating mechanism and adaptive feature fusion.基于门控机制和自适应特征融合的掌纹识别
Optimizing the early diagnosis of neurological disorders through the application of machine learning for predictive analytics in medical imaging.
通过应用机器学习进行医学成像预测分析来优化神经系统疾病的早期诊断。
Sci Rep. 2025 Jul 2;15(1):22488. doi: 10.1038/s41598-025-05888-z.
Front Neurorobot. 2023 May 26;17:1203962. doi: 10.3389/fnbot.2023.1203962. eCollection 2023.
4
A COVID-19 medical image classification algorithm based on Transformer.基于 Transformer 的 COVID-19 医学图像分类算法。
Sci Rep. 2023 Apr 1;13(1):5359. doi: 10.1038/s41598-023-32462-2.
5
COVID-19 pneumonia: lessons learned, challenges, and preparing for the future.新型冠状病毒肺炎:经验教训、挑战和未来准备。
Diagn Interv Radiol. 2022 Nov;28(6):576-585. doi: 10.5152/dir.2022.221881.
6
Ensemble Technique Coupled with Deep Transfer Learning Framework for Automatic Detection of Tuberculosis from Chest X-ray Radiographs.结合集成技术与深度迁移学习框架用于从胸部X光片中自动检测肺结核
Healthcare (Basel). 2022 Nov 21;10(11):2335. doi: 10.3390/healthcare10112335.
7
Automated Lung-Related Pneumonia and COVID-19 Detection Based on Novel Feature Extraction Framework and Vision Transformer Approaches Using Chest X-ray Images.基于新型特征提取框架和视觉Transformer方法的胸部X光图像自动肺部相关肺炎和新冠肺炎检测
Bioengineering (Basel). 2022 Nov 18;9(11):709. doi: 10.3390/bioengineering9110709.
8
CCT: Lightweight compact convolutional transformer for lung disease CT image classification.CCT:用于肺病CT图像分类的轻量级紧凑型卷积变压器
Front Physiol. 2022 Nov 4;13:1066999. doi: 10.3389/fphys.2022.1066999. eCollection 2022.
9
AI-Assisted Tuberculosis Detection and Classification from Chest X-Rays Using a Deep Learning Normalization-Free Network Model.基于深度学习无归一化网络模型的 AI 辅助 chest X-ray 结核检测与分类
Comput Intell Neurosci. 2022 Oct 3;2022:2399428. doi: 10.1155/2022/2399428. eCollection 2022.
10
Early Diagnosis of Tuberculosis Using Deep Learning Approach for IOT Based Healthcare Applications.基于物联网的医疗保健应用的深度学习方法进行结核病的早期诊断。
Comput Intell Neurosci. 2022 Sep 28;2022:3357508. doi: 10.1155/2022/3357508. eCollection 2022.