再训练和数据划分对胸部X光片上COVID-19分类任务中深度学习模型泛化能力的影响。

Impact of retraining and data partitions on the generalizability of a deep learning model in the task of COVID-19 classification on chest radiographs.

作者信息

Shenouda Mena, Whitney Heather M, Giger Maryellen L, Armato Samuel G

机构信息

The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, Illinois, United States.

出版信息

J Med Imaging (Bellingham). 2024 Nov;11(6):064503. doi: 10.1117/1.JMI.11.6.064503. Epub 2024 Dec 26.

DOI:10.1117/1.JMI.11.6.064503

PMID:39734609

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11670362/

Abstract

PURPOSE

This study aimed to investigate the impact of different model retraining schemes and data partitioning on model performance in the task of COVID-19 classification on standard chest radiographs (CXRs), in the context of model generalizability.

APPROACH

Two datasets from the same institution were used: Set A (9860 patients, collected from 02/20/2020 to 02/03/2021) and Set B (5893 patients, collected from 03/15/2020 to 01/01/2022). An original deep learning (DL) model trained and tested in the task of COVID-19 classification using the initial partition of Set A achieved an area under the curve (AUC) value of 0.76, whereas Set B yielded a significantly lower value of 0.67. To explore this discrepancy, four separate strategies were undertaken on the original model: (1) retrain using Set B, (2) fine-tune using Set B, (3) regularization, and (4) repartition of the training set from Set A 200 times and report AUC values.

RESULTS

The model achieved the following AUC values (95% confidence interval) for the four methods: (1) 0.61 [0.56, 0.66]; (2) 0.70 [0.66, 0.73], both on Set B; (3) 0.76 [0.72, 0.79] on the initial test partition of Set A and 0.68 [0.66, 0.70] on Set B; and (4) on repartitions of Set A. The lowest AUC value (0.66 [0.62, 0.69]) of the Set A repartitions was no longer significantly different from the initial 0.67 achieved on Set B.

CONCLUSIONS

Different data repartitions of the same dataset used to train a DL model demonstrated significantly different performance values that helped explain the discrepancy between Set A and Set B and further demonstrated the limitations of model generalizability.

摘要

目的

本研究旨在探讨在模型泛化的背景下，不同的模型再训练方案和数据划分对基于标准胸部X光片（CXR）进行COVID-19分类任务中模型性能的影响。

方法

使用了来自同一机构的两个数据集：A组（9860例患者，收集于2020年2月20日至2021年2月3日）和B组（5893例患者，收集于2020年3月15日至2022年1月1日）。一个在使用A组初始划分进行COVID-19分类任务中训练和测试的原始深度学习（DL）模型，其曲线下面积（AUC）值为0.76，而B组的该值显著较低，为0.67。为探究这种差异，对原始模型采取了四种不同策略：（1）使用B组进行再训练，（2）使用B组进行微调，（3）正则化，以及（4）对A组训练集进行200次重新划分并报告AUC值。

结果

该模型对四种方法获得的AUC值（95%置信区间）如下：（1）在B组上为0.61[0.56, 0.66]；（2）在B组上为0.70[0.66, 0.73]；（3）在A组初始测试划分上为0.76[0.72, 0.79]，在B组上为0.68[0.66, 0.70]；以及（4）在A组重新划分上。A组重新划分中最低的AUC值（0.66[0.62, 0.69]）与在B组上最初获得的0.67不再有显著差异。

结论

用于训练DL模型的同一数据集的不同数据重新划分显示出显著不同的性能值，这有助于解释A组和B组之间的差异，并进一步证明了模型泛化的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e2f/11670362/e0b936e77ea8/JMI-011-064503-g001.jpg

相似文献

Impact of retraining and data partitions on the generalizability of a deep learning model in the task of COVID-19 classification on chest radiographs.再训练和数据划分对胸部X光片上COVID-19分类任务中深度学习模型泛化能力的影响。

J Med Imaging (Bellingham). 2024 Nov;11(6):064503. doi: 10.1117/1.JMI.11.6.064503. Epub 2024 Dec 26.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤：系统评价与经济学评估

Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.

A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。

Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

A Comprehensive Drift-Adaptive Framework for Sustaining Model Performance in COVID-19 Detection From Dynamic Cough Audio Data: Model Development and Validation.一种用于在动态咳嗽音频数据的COVID-19检测中维持模型性能的综合漂移自适应框架：模型开发与验证

J Med Internet Res. 2025 Jun 3;27:e66919. doi: 10.2196/66919.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。

Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Pulmonary nodule detection in low dose computed tomography using a medical-to-medical transfer learning approach.使用医学到医学迁移学习方法在低剂量计算机断层扫描中检测肺结节。

J Med Imaging (Bellingham). 2024 Jul;11(4):044502. doi: 10.1117/1.JMI.11.4.044502. Epub 2024 Jul 9.

本文引用的文献

Assessment of a deep learning model for COVID-19 classification on chest radiographs: a comparison across image acquisition techniques and clinical factors.胸部X光片上用于COVID-19分类的深度学习模型评估：图像采集技术和临床因素的比较

J Med Imaging (Bellingham). 2023 Nov;10(6):064504. doi: 10.1117/1.JMI.10.6.064504. Epub 2023 Dec 28.

AI in medical imaging grand challenges: translation from competition to research benefit and patient care.人工智能在医学影像领域的重大挑战：从竞赛到研究收益和患者护理的转化。

Br J Radiol. 2023 Oct;96(1150):20221152. doi: 10.1259/bjr.20221152. Epub 2023 Sep 12.

Role of sureness in evaluating AI/CADx: Lesion-based repeatability of machine learning classification performance on breast MRI.Surety 在评估 AI/CADx 中的作用：基于病灶的机器学习分类性能在乳腺 MRI 上的重复性。

Med Phys. 2024 Mar;51(3):1812-1821. doi: 10.1002/mp.16673. Epub 2023 Aug 21.

Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls.机器学习模型的可推广性：三种方法陷阱的定量评估

Radiol Artif Intell. 2022 Nov 16;5(1):e220028. doi: 10.1148/ryai.220028. eCollection 2023 Jan.

Machine learning generalizability across healthcare settings: insights from multi-site COVID-19 screening.跨医疗环境的机器学习可推广性：来自多地点新冠病毒筛查的见解

NPJ Digit Med. 2022 Jun 7;5(1):69. doi: 10.1038/s41746-022-00614-9.

Role of standard and soft tissue chest radiography images in deep-learning-based early diagnosis of COVID-19.标准胸部X线片和软组织胸部X线片图像在基于深度学习的COVID-19早期诊断中的作用

J Med Imaging (Bellingham). 2021 Jan;8(Suppl 1):014503. doi: 10.1117/1.JMI.8.S1.014503. Epub 2021 Sep 28.

RANDGAN: Randomized generative adversarial network for detection of COVID-19 in chest X-ray.RANDGAN：用于胸部 X 光 COVID-19 检测的随机生成对抗网络。

Sci Rep. 2021 Apr 21;11(1):8602. doi: 10.1038/s41598-021-87994-2.

Reproducibility in machine learning for health research: Still a ways to go.机器学习在健康研究中的可重复性：仍有很长的路要走。

Sci Transl Med. 2021 Mar 24;13(586). doi: 10.1126/scitranslmed.abb1655.

COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images.COVID-Net：一种针对胸部 X 光图像中 COVID-19 病例检测的定制化深度卷积神经网络设计。

Sci Rep. 2020 Nov 11;10(1):19549. doi: 10.1038/s41598-020-76550-z.

Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal.COVID-19 诊断和预后预测模型：系统评价和批判性评估。

BMJ. 2020 Apr 7;369:m1328. doi: 10.1136/bmj.m1328.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验