机器学习在心血管成像中，训练/测试样本方案对性能估计稳定性的影响。

Impact of train/test sample regimen on performance estimate stability of machine learning in cardiovascular imaging.

机构信息

University of California at Los Angeles, Los Angeles, CA, USA.

Division of Artificial Intelligence in Medicine, Departments of Medicine and Cardiology, Cedars Sinai Medical Center, Beverly Boulevard, Ste. A047N, Los Angeles, CA, 8700, USA.

出版信息

Sci Rep. 2021 Jul 14;11(1):14490. doi: 10.1038/s41598-021-93651-5.

DOI:10.1038/s41598-021-93651-5

PMID:34262098

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8280147/

Abstract

As machine learning research in the field of cardiovascular imaging continues to grow, obtaining reliable model performance estimates is critical to develop reliable baselines and compare different algorithms. While the machine learning community has generally accepted methods such as k-fold stratified cross-validation (CV) to be more rigorous than single split validation, the standard research practice in medical fields is the use of single split validation techniques. This is especially concerning given the relatively small sample sizes of datasets used for cardiovascular imaging. We aim to examine how train-test split variation impacts the stability of machine learning (ML) model performance estimates in several validation techniques on two real-world cardiovascular imaging datasets: stratified split-sample validation (70/30 and 50/50 train-test splits), tenfold stratified CV, 10 × repeated tenfold stratified CV, bootstrapping (500 × repeated), and leave one out (LOO) validation. We demonstrate that split validation methods lead to the highest range in AUC and statistically significant differences in ROC curves, unlike the other aforementioned approaches. When building predictive models on relatively small data sets as is often the case in medical imaging, split-sample validation techniques can produce instability in performance estimates with variations in range over 0.15 in the AUC values, and thus any of the alternate validation methods are recommended.

摘要

随着机器学习在心血管成像领域的研究不断深入，获得可靠的模型性能估计对于开发可靠的基准和比较不同算法至关重要。虽然机器学习社区普遍认为 k 折分层交叉验证（CV）比单分割验证更严格，但医学领域的标准研究实践是使用单分割验证技术。考虑到心血管成像中使用的数据集相对较小，这尤其令人担忧。我们旨在研究在两个真实的心血管成像数据集上的几种验证技术中，训练-测试分割变化如何影响机器学习（ML）模型性能估计的稳定性：分层分割样本验证（70/30 和 50/50 训练-测试分割）、十折分层 CV、10×重复十折分层 CV、引导（500×重复）和留一法（LOO）验证。我们证明了与其他方法不同，分割验证方法导致 AUC 的范围最高，并且在 ROC 曲线上存在统计学上显著的差异。当在医学成像中经常出现的相对较小的数据集中构建预测模型时，样本分割验证技术会导致性能估计不稳定，AUC 值的范围变化超过 0.15，因此建议使用任何替代验证方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/67f1/8280147/eab30dfc3f12/41598_2021_93651_Fig1_HTML.jpg

相似文献

Impact of train/test sample regimen on performance estimate stability of machine learning in cardiovascular imaging.

Sci Rep. 2021 Jul 14;11(1):14490. doi: 10.1038/s41598-021-93651-5.

Prognostic Value of Combined Clinical and Myocardial Perfusion Imaging Data Using Machine Learning.

JACC Cardiovasc Imaging. 2018 Jul;11(7):1000-1009. doi: 10.1016/j.jcmg.2017.07.024. Epub 2017 Oct 18.

Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results.

PLoS One. 2021 Aug 12;16(8):e0256152. doi: 10.1371/journal.pone.0256152. eCollection 2021.

Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study.

PLoS Med. 2018 Nov 6;15(11):e1002674. doi: 10.1371/journal.pmed.1002674. eCollection 2018 Nov.

Machine learning algorithm validation with a limited sample size.

PLoS One. 2019 Nov 7;14(11):e0224365. doi: 10.1371/journal.pone.0224365. eCollection 2019.

Preoperative prediction for pathological grade of hepatocellular carcinoma via machine learning-based radiomics.

Eur Radiol. 2020 Dec;30(12):6924-6932. doi: 10.1007/s00330-020-07056-5. Epub 2020 Jul 22.

Automatic Valve Plane Localization in Myocardial Perfusion SPECT/CT by Machine Learning: Anatomic and Clinical Validation.

J Nucl Med. 2017 Jun;58(6):961-967. doi: 10.2967/jnumed.116.179911. Epub 2016 Nov 3.

Automated semantic labeling of pediatric musculoskeletal radiographs using deep learning.

Pediatr Radiol. 2019 Jul;49(8):1066-1070. doi: 10.1007/s00247-019-04408-2. Epub 2019 Apr 30.

Nonclinical Features in Predictive Modeling of Cardiovascular Diseases: A Machine Learning Approach.

Interdiscip Sci. 2021 Jun;13(2):201-211. doi: 10.1007/s12539-021-00423-w. Epub 2021 Mar 6.

A comparison of resampling schemes for estimating model observer performance with small ensembles.

Phys Med Biol. 2017 Aug 22;62(18):7300-7320. doi: 10.1088/1361-6560/aa807a.

引用本文的文献

Predicthor: AI-Powered Predictive Risk Model for 30-Day Mortality and 30-Day Complications in Patients Undergoing Thoracic Surgery for Lung Cancer.

Ann Surg Open. 2025 May 27;6(2):e578. doi: 10.1097/AS9.0000000000000578. eCollection 2025 Jun.

Insights into radiomics: a comprehensive review for beginners.

Clin Transl Oncol. 2025 May 12. doi: 10.1007/s12094-025-03939-5.

Predicting experiences of paranoia and auditory verbal hallucinations in daily life with ambulatory sensor data - A feasibility study.

Psychol Med. 2025 Apr 11;55:e114. doi: 10.1017/S0033291725000881.

Development and validation of a machine learning model to predict myocardial blood flow and clinical outcomes from patients' electrocardiograms.

Cell Rep Med. 2024 Oct 15;5(10):101746. doi: 10.1016/j.xcrm.2024.101746. Epub 2024 Sep 25.

Trade-off between training and testing ratio in machine learning for medical image processing.

PeerJ Comput Sci. 2024 Sep 6;10:e2245. doi: 10.7717/peerj-cs.2245. eCollection 2024.

A scoping review of large language model based approaches for information extraction from radiology reports.

NPJ Digit Med. 2024 Aug 24;7(1):222. doi: 10.1038/s41746-024-01219-0.

Transfer Learning Video Classification of Preserved, Mid-Range, and Reduced Left Ventricular Ejection Fraction in Echocardiography.

Diagnostics (Basel). 2024 Jul 5;14(13):1439. doi: 10.3390/diagnostics14131439.

Compact machine learning model for the accurate prediction of first 24-hour survival of mechanically ventilated patients.

Front Med (Lausanne). 2024 Jun 20;11:1398565. doi: 10.3389/fmed.2024.1398565. eCollection 2024.

Prediction of tuberculosis clusters in the riverine municipalities of the Brazilian Amazon with machine learning.

Rev Bras Epidemiol. 2024 May 13;27:e240024. doi: 10.1590/1980-549720240024. eCollection 2024.

Multimodal modeling with low-dose CT and clinical information for diagnostic artificial intelligence on mediastinal tumors: a preliminary study.

BMJ Open Respir Res. 2024 Apr 8;11(1):e002249. doi: 10.1136/bmjresp-2023-002249.

本文引用的文献

Rationale and design of the REgistry of Fast Myocardial Perfusion Imaging with NExt generation SPECT (REFINE SPECT).

J Nucl Cardiol. 2020 Jun;27(3):1010-1021. doi: 10.1007/s12350-018-1326-4. Epub 2018 Jun 19.

A Clinical and Biomarker Scoring System to Predict the Presence of Obstructive Coronary Artery Disease.

J Am Coll Cardiol. 2017 Mar 7;69(9):1147-1156. doi: 10.1016/j.jacc.2016.12.021.

Big data analytics to improve cardiovascular care: promise and challenges.

Nat Rev Cardiol. 2016 Jun;13(6):350-9. doi: 10.1038/nrcardio.2016.42. Epub 2016 Mar 24.

Prediction models need appropriate internal, internal-external, and external validation.

J Clin Epidemiol. 2016 Jan;69:245-7. doi: 10.1016/j.jclinepi.2015.04.005. Epub 2015 Apr 18.

Prediction of revascularization after myocardial perfusion SPECT by machine learning in a large population.

J Nucl Cardiol. 2015 Oct;22(5):877-84. doi: 10.1007/s12350-014-0027-x. Epub 2014 Dec 6.

Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes.

Genet Epidemiol. 2013 Apr;37(3):276-82. doi: 10.1002/gepi.21721. Epub 2013 Mar 7.

Predicting survival in patients receiving continuous flow left ventricular assist devices: the HeartMate II risk score.

J Am Coll Cardiol. 2013 Jan 22;61(3):313-21. doi: 10.1016/j.jacc.2012.09.055. Epub 2012 Dec 19.

pROC: an open-source package for R and S+ to analyze and compare ROC curves.

BMC Bioinformatics. 2011 Mar 17;12:77. doi: 10.1186/1471-2105-12-77.

Small-sample precision of ROC-related estimates.

Bioinformatics. 2010 Mar 15;26(6):822-30. doi: 10.1093/bioinformatics/btq037. Epub 2010 Feb 3.

Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.

J Clin Epidemiol. 2001 Aug;54(8):774-81. doi: 10.1016/s0895-4356(01)00341-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

机器学习在心血管成像中，训练/测试样本方案对性能估计稳定性的影响。

Impact of train/test sample regimen on performance estimate stability of machine learning in cardiovascular imaging.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献