Rajaraman Sivaramakrishnan, Liang Zhaohui, Xue Zhiyun, Antani Sameer
Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States.
Front Artif Intell. 2024 Sep 5;7:1419638. doi: 10.3389/frai.2024.1419638. eCollection 2024.
Deep learning (DL) has significantly advanced medical image classification. However, it often relies on transfer learning (TL) from models pretrained on large, generic non-medical image datasets like ImageNet. Conversely, medical images possess unique visual characteristics that such general models may not adequately capture.
This study examines the effectiveness of modality-specific pretext learning strengthened by image denoising and deblurring in enhancing the classification of pediatric chest X-ray (CXR) images into those exhibiting no findings, i.e., normal lungs, or with cardiopulmonary disease manifestations. Specifically, we use a architecture and leverage its encoder in conjunction with a classification head to distinguish normal from abnormal pediatric CXR findings. We benchmark this performance against the traditional TL approach, , the VGG-16 model pretrained only on ImageNet. Measures used for performance evaluation are balanced accuracy, sensitivity, specificity, F-score, Matthew's Correlation Coefficient (MCC), Kappa statistic, and Youden's index.
Our findings reveal that models developed from CXR modality-specific pretext encoders substantially outperform the ImageNet-only pretrained model, , Baseline, and achieve significantly higher sensitivity ( < 0.05) with marked improvements in balanced accuracy, F-score, MCC, Kappa statistic, and Youden's index. A novel attention-based fuzzy ensemble of the pretext-learned models further improves performance across these metrics (Balanced accuracy: 0.6376; Sensitivity: 0.4991; F-score: 0.5102; MCC: 0.2783; Kappa: 0.2782, and Youden's index:0.2751), compared to Baseline (Balanced accuracy: 0.5654; Sensitivity: 0.1983; F-score: 0.2977; MCC: 0.1998; Kappa: 0.1599, and Youden's index:0.1327).
The superior results of CXR modality-specific pretext learning and their ensemble underscore its potential as a viable alternative to conventional ImageNet pretraining for medical image classification. Results from this study promote further exploration of medical modality-specific TL techniques in the development of DL models for various medical imaging applications.
深度学习(DL)极大地推动了医学图像分类的发展。然而,它通常依赖于从在大型通用非医学图像数据集(如图像网)上预训练的模型进行迁移学习(TL)。相反,医学图像具有独特的视觉特征,此类通用模型可能无法充分捕捉。
本研究考察了通过图像去噪和去模糊强化的特定模态预训练学习在将儿科胸部X光(CXR)图像分类为无异常表现(即正常肺部)或有心肺疾病表现的图像方面的有效性。具体而言,我们使用一种架构,并将其编码器与分类头结合使用,以区分正常和异常的儿科CXR表现。我们将此性能与传统的TL方法(仅在图像网上预训练的VGG - 16模型)进行基准测试。用于性能评估的指标包括平衡准确率、灵敏度、特异性、F分数、马修斯相关系数(MCC)、卡帕统计量和约登指数。
我们的研究结果表明,由CXR模态特定预训练编码器开发的模型显著优于仅在图像网上预训练的模型(基线),并实现了显著更高的灵敏度(<0.05),在平衡准确率、F分数、MCC、卡帕统计量和约登指数方面有显著提高。一种基于注意力的新型模糊集成预训练模型在这些指标上进一步提高了性能(平衡准确率:0.6376;灵敏度:0.4991;F分数:0.5102;MCC:0.2783;卡帕:0.2782,和约登指数:0.2751),相比基线(平衡准确率:0.5654;灵敏度:0.1983;F分数:0.2977;MCC:0.1998;卡帕:0.1599,和约登指数:0.1327)。
CXR模态特定预训练学习及其集成的卓越结果突出了其作为医学图像分类中传统图像网预训练的可行替代方案的潜力。本研究结果促进了在开发用于各种医学成像应用的DL模型时,对医学模态特定TL技术的进一步探索。