Suppr超能文献

使用混合卷积和视觉Transformer网络增强胸部X光片中的肺炎检测

Enhanced Pneumonia Detection in Chest X-Rays Using Hybrid Convolutional and Vision Transformer Networks.

作者信息

Mustapha Benzorgat, Zhou Yatong, Shan Chunyan, Xiao Zhitao

机构信息

School of Electronics and Information Engineering, Hebei University of Technology, Tianjin 300401, China.

NHC Key Laboratory of Hormones and Development, Tianjin Key Laboratory of Metabolic Diseases, Chu Hsien-I Memorial Hospital & Tianjin Institute of Endocrinology, Tianjin Medical University, Tianjin 300134, China.

出版信息

Curr Med Imaging. 2025;21:e15734056326685. doi: 10.2174/0115734056326685250101113959.

Abstract

OBJECTIVE

The objective of this research is to enhance pneumonia detection in chest X-rays by leveraging a novel hybrid deep learning model that combines Convolutional Neural Networks (CNNs) with modified Swin Transformer blocks. This study aims to significantly improve diagnostic accuracy, reduce misclassifications, and provide a robust, deployable solution for underdeveloped regions where access to conventional diagnostics and treatment is limited.

METHODS

The study developed a hybrid model architecture integrating CNNs with modified Swin Transformer blocks to work seamlessly within the same model. The CNN layers perform initial feature extraction, capturing local patterns within the images. At the same time, the modified Swin Transformer blocks handle long-range dependencies and global context through window-based self-attention mechanisms. Preprocessing steps included resizing images to 224x224 pixels and applying Contrast Limited Adaptive Histogram Equalization (CLAHE) to enhance image features. Data augmentation techniques, such as horizontal flipping, rotation, and zooming, were utilized to prevent overfitting and ensure model robustness. Hyperparameter optimization was conducted using Optuna, employing Bayesian optimization (Tree-structured Parzen Estimator) to fine-tune key parameters of both the CNN and Swin Transformer components, ensuring optimal model performance.

RESULTS

The proposed hybrid model was trained and validated on a dataset provided by the Guangzhou Women and Children's Medical Center. The model achieved an overall accuracy of 98.72% and a loss of 0.064 on an unseen dataset, significantly outperforming a baseline CNN model. Detailed performance metrics indicated a precision of 0.9738 for the normal class and 1.0000 for the pneumonia class, with an overall F1-score of 0.9872. The hybrid model consistently outperformed the CNN model across all performance metrics, demonstrating higher accuracy, precision, recall, and F1-score. Confusion matrices revealed high sensitivity and specificity with minimal misclassifications.

CONCLUSION

The proposed hybrid CNN-ViT model, which integrates modified Swin Transformer blocks within the CNN architecture, provides a significant advancement in pneumonia detection by effectively capturing both local and global features within chest X-ray images. The modifications to the Swin Transformer blocks enable them to work seamlessly with the CNN layers, enhancing the model's ability to understand complex visual patterns and dependencies. This results in superior classification performance. The lightweight design of the model eliminates the need for extensive hardware, facilitating easy deployment in resource-constrained settings. This innovative approach not only improves pneumonia diagnosis but also has the potential to enhance patient outcomes and support healthcare providers in underdeveloped regions. Future research will focus on further refining the model architecture, incorporating more advanced image processing techniques, and exploring explainable AI methods to provide deeper insights into the model's decision-making process.

摘要

目的

本研究的目的是通过利用一种将卷积神经网络(CNN)与改进的Swin Transformer模块相结合的新型混合深度学习模型,提高胸部X光片中肺炎的检测能力。本研究旨在显著提高诊断准确性,减少错误分类,并为传统诊断和治疗资源有限的欠发达地区提供一个强大的、可部署的解决方案。

方法

该研究开发了一种混合模型架构,将CNN与改进的Swin Transformer模块集成在一起,以便在同一模型中无缝运行。CNN层执行初始特征提取,捕捉图像中的局部模式。与此同时,改进的Swin Transformer模块通过基于窗口的自注意力机制处理长距离依赖关系和全局上下文。预处理步骤包括将图像调整为224x224像素,并应用对比度受限自适应直方图均衡化(CLAHE)来增强图像特征。利用数据增强技术,如水平翻转、旋转和缩放,以防止过拟合并确保模型的鲁棒性。使用Optuna进行超参数优化,采用贝叶斯优化(树结构帕曾估计器)对CNN和Swin Transformer组件的关键参数进行微调,以确保模型的最佳性能。

结果

所提出的混合模型在广州妇女儿童医疗中心提供的数据集上进行了训练和验证。该模型在一个未见数据集上实现了98.72%的总体准确率和0.064的损失,显著优于基线CNN模型。详细的性能指标表明,正常类别的精确率为0.9738,肺炎类别的精确率为1.0000,总体F1分数为0.9872。在所有性能指标上,混合模型始终优于CNN模型,展示出更高的准确率、精确率、召回率和F1分数。混淆矩阵显示出高敏感性和特异性,错误分类最少。

结论

所提出的混合CNN-ViT模型在CNN架构中集成了改进的Swin Transformer模块,通过有效捕捉胸部X光图像中的局部和全局特征,在肺炎检测方面取得了显著进展。对Swin Transformer模块的修改使其能够与CNN层无缝协作,增强了模型理解复杂视觉模式和依赖关系的能力。这导致了卓越的分类性能。该模型的轻量级设计无需大量硬件,便于在资源受限的环境中轻松部署。这种创新方法不仅改善了肺炎诊断,还有可能提高患者的治疗效果,并为欠发达地区的医疗服务提供者提供支持。未来的研究将集中在进一步优化模型架构,纳入更先进的图像处理技术,以及探索可解释的人工智能方法,以更深入地了解模型的决策过程。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验