Wu Derek, Smith Delaney, VanBerlo Blake, Roshankar Amir, Lee Hoseok, Li Brian, Ali Faraz, Rahman Marwan, Basmaji John, Tschirhart Jared, Ford Alex, VanBerlo Bennett, Durvasula Ashritha, Vannelli Claire, Dave Chintan, Deglint Jason, Ho Jordan, Chaudhary Rushil, Clausdorff Hans, Prager Ross, Millington Scott, Shah Samveg, Buchanan Brian, Arntfield Robert
Department of Medicine, Western University, London, ON N6A 5C1, Canada.
Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
Diagnostics (Basel). 2024 May 22;14(11):1081. doi: 10.3390/diagnostics14111081.
Deep learning (DL) models for medical image classification frequently struggle to generalize to data from outside institutions. Additional clinical data are also rarely collected to comprehensively assess and understand model performance amongst subgroups. Following the development of a single-center model to identify the lung sliding artifact on lung ultrasound (LUS), we pursued a validation strategy using external LUS data. As annotated LUS data are relatively scarce-compared to other medical imaging data-we adopted a novel technique to optimize the use of limited external data to improve model generalizability. Externally acquired LUS data from three tertiary care centers, totaling 641 clips from 238 patients, were used to assess the baseline generalizability of our lung sliding model. We then employed our novel Threshold-Aware Accumulative Fine-Tuning (TAAFT) method to fine-tune the baseline model and determine the minimum amount of data required to achieve predefined performance goals. A subgroup analysis was also performed and Grad-CAM++ explanations were examined. The final model was fine-tuned on one-third of the external dataset to achieve 0.917 sensitivity, 0.817 specificity, and 0.920 area under the receiver operator characteristic curve (AUC) on the external validation dataset, exceeding our predefined performance goals. Subgroup analyses identified LUS characteristics that most greatly challenged the model's performance. Grad-CAM++ saliency maps highlighted clinically relevant regions on M-mode images. We report a multicenter study that exploits limited available external data to improve the generalizability and performance of our lung sliding model while identifying poorly performing subgroups to inform future iterative improvements. This approach may contribute to efficiencies for DL researchers working with smaller quantities of external validation data.
用于医学图像分类的深度学习(DL)模型常常难以推广到来自其他机构的数据。此外,也很少收集额外的临床数据来全面评估和了解亚组中的模型性能。在开发了一个用于识别肺部超声(LUS)上肺滑动伪像的单中心模型之后,我们采用了一种使用外部LUS数据的验证策略。由于与其他医学成像数据相比,带注释的LUS数据相对稀缺,我们采用了一种新技术来优化有限外部数据的使用,以提高模型的通用性。来自三个三级医疗中心的外部获取的LUS数据,共238例患者的641个片段,用于评估我们肺滑动模型的基线通用性。然后,我们采用新颖的阈值感知累积微调(TAAFT)方法对基线模型进行微调,并确定实现预定义性能目标所需的最少数据量。还进行了亚组分析并检查了Grad-CAM++解释。最终模型在三分之一的外部数据集上进行了微调,在外部验证数据集上实现了0.917的灵敏度、0.817的特异性和0.920的受试者操作特征曲线(AUC)下面积,超过了我们预定义的性能目标。亚组分析确定了对模型性能挑战最大的LUS特征。Grad-CAM++显著性图突出了M型图像上与临床相关的区域。我们报告了一项多中心研究,该研究利用有限的可用外部数据来提高我们肺滑动模型的通用性和性能,同时识别表现不佳的亚组,为未来的迭代改进提供参考。这种方法可能有助于提高使用较少量外部验证数据的DL研究人员的工作效率。