小样本量的放射组学机器学习研究：单一随机训练-测试集拆分可能导致不可靠的结果。

Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results.

机构信息

Department of Radiology, National Health Insurance Service Ilsan Hospital, Goyang, Korea.

Research Institute, National Health Insurance Service Ilsan Hospital, Goyang, Korea.

出版信息

PLoS One. 2021 Aug 12;16(8):e0256152. doi: 10.1371/journal.pone.0256152. eCollection 2021.

DOI:10.1371/journal.pone.0256152

PMID:34383858

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8360533/

Abstract

This study aims to determine how randomly splitting a dataset into training and test sets affects the estimated performance of a machine learning model and its gap from the test performance under different conditions, using real-world brain tumor radiomics data. We conducted two classification tasks of different difficulty levels with magnetic resonance imaging (MRI) radiomics features: (1) "Simple" task, glioblastomas [n = 109] vs. brain metastasis [n = 58] and (2) "difficult" task, low- [n = 163] vs. high-grade [n = 95] meningiomas. Additionally, two undersampled datasets were created by randomly sampling 50% from these datasets. We performed random training-test set splitting for each dataset repeatedly to create 1,000 different training-test set pairs. For each dataset pair, the least absolute shrinkage and selection operator model was trained and evaluated using various validation methods in the training set, and tested in the test set, using the area under the curve (AUC) as an evaluation metric. The AUCs in training and testing varied among different training-test set pairs, especially with the undersampled datasets and the difficult task. The mean (±standard deviation) AUC difference between training and testing was 0.039 (±0.032) for the simple task without undersampling and 0.092 (±0.071) for the difficult task with undersampling. In a training-test set pair with the difficult task without undersampling, for example, the AUC was high in training but much lower in testing (0.882 and 0.667, respectively); in another dataset pair with the same task, however, the AUC was low in training but much higher in testing (0.709 and 0.911, respectively). When the AUC discrepancy between training and test, or generalization gap, was large, none of the validation methods helped sufficiently reduce the generalization gap. Our results suggest that machine learning after a single random training-test set split may lead to unreliable results in radiomics studies especially with small sample sizes.

摘要

本研究旨在使用真实的脑肿瘤放射组学数据，确定随机将数据集划分为训练集和测试集如何影响机器学习模型的估计性能及其在不同条件下与测试性能的差距。我们使用磁共振成像（MRI）放射组学特征进行了两个难度不同的分类任务：（1）“简单”任务，胶质母细胞瘤[ n = 109]与脑转移瘤[ n = 58]，（2）“困难”任务，低级别[ n = 163]与高级别[ n = 95]脑膜瘤。此外，通过从这些数据集随机抽取 50%创建了两个采样不足的数据集。我们对每个数据集重复进行随机训练-测试集分割，创建了 1000 个不同的训练-测试集对。对于每个数据集对，使用各种验证方法在训练集中训练和评估最小绝对收缩和选择算子模型，并在测试集中进行测试，使用曲线下面积（AUC）作为评估指标。不同的训练-测试集对之间的 AUC 在训练和测试中有所不同，尤其是在采样不足的数据集和困难任务中。在没有采样不足的简单任务中，训练和测试之间的平均（±标准差）AUC 差异为 0.039（±0.032），在有采样不足的困难任务中为 0.092（±0.071）。例如，在没有采样不足的困难任务的一个训练-测试集中，AUC 在训练中较高，但在测试中较低（分别为 0.882 和 0.667）；然而，在另一个具有相同任务的数据集对中，AUC 在训练中较低，但在测试中较高（分别为 0.709 和 0.911）。当训练和测试之间的 AUC 差异（或泛化差距）较大时，没有一种验证方法可以充分减小泛化差距。我们的结果表明，在放射组学研究中，特别是在样本量较小时，单次随机训练-测试集分割后的机器学习可能导致不可靠的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8c9/8360533/8482b95142e5/pone.0256152.g001.jpg

相似文献

Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results.

PLoS One. 2021 Aug 12;16(8):e0256152. doi: 10.1371/journal.pone.0256152. eCollection 2021.

Radiomics MRI Phenotyping with Machine Learning to Predict the Grade of Lower-Grade Gliomas: A Study Focused on Nonenhancing Tumors.

Korean J Radiol. 2019 Sep;20(9):1381-1389. doi: 10.3348/kjr.2018.0814.

A quantitative model based on clinically relevant MRI features differentiates lower grade gliomas and glioblastoma.

Eur Radiol. 2020 Jun;30(6):3073-3082. doi: 10.1007/s00330-019-06632-8. Epub 2020 Feb 5.

Diffusion radiomics as a diagnostic model for atypical manifestation of primary central nervous system lymphoma: development and multicenter external validation.

Neuro Oncol. 2018 Aug 2;20(9):1251-1261. doi: 10.1093/neuonc/noy021.

Bi-parametric magnetic resonance imaging based radiomics for the identification of benign and malignant prostate lesions: cross-vendor validation.

Phys Eng Sci Med. 2021 Sep;44(3):745-754. doi: 10.1007/s13246-021-01022-1. Epub 2021 Jun 1.

Differentiation of recurrent glioblastoma from radiation necrosis using diffusion radiomics with machine learning model development and external validation.

Sci Rep. 2021 Feb 3;11(1):2913. doi: 10.1038/s41598-021-82467-y.

Magnetic resonance imaging radiomics predicts preoperative axillary lymph node metastasis to support surgical decisions and is associated with tumor microenvironment in invasive breast cancer: A machine learning, multicenter study.

EBioMedicine. 2021 Jul;69:103460. doi: 10.1016/j.ebiom.2021.103460. Epub 2021 Jul 4.

Preoperative prediction for pathological grade of hepatocellular carcinoma via machine learning-based radiomics.

Eur Radiol. 2020 Dec;30(12):6924-6932. doi: 10.1007/s00330-020-07056-5. Epub 2020 Jul 22.

Fusion Radiomics Features from Conventional MRI Predict MGMT Promoter Methylation Status in Lower Grade Gliomas.

Eur J Radiol. 2019 Dec;121:108714. doi: 10.1016/j.ejrad.2019.108714. Epub 2019 Oct 19.

Preoperative MRI-Based Radiomic Machine-Learning Nomogram May Accurately Distinguish Between Benign and Malignant Soft-Tissue Lesions: A Two-Center Study.

J Magn Reson Imaging. 2020 Sep;52(3):873-882. doi: 10.1002/jmri.27111. Epub 2020 Feb 29.

引用本文的文献

Reproductive Toxicity Effects of Phthalates Based on the Hypothalamic-Pituitary-Gonadal Axis: A Priority Control List Construction from Theoretical Methods.

Int J Mol Sci. 2025 Jul 31;26(15):7389. doi: 10.3390/ijms26157389.

RadiomiX for Radiomics Analysis: Automated Approaches to Overcome Challenges in Replicability.

Diagnostics (Basel). 2025 Aug 5;15(15):1968. doi: 10.3390/diagnostics15151968.

Precision over quantity: identifying trustworthy genomic risk stratification models.

Einstein (Sao Paulo). 2025 Aug 1;23:eCE1726. doi: 10.31744/einstein_journal/2025CE1726. eCollection 2025.

Development of AI-Based Predictive Models for Osteoporosis Diagnosis in Postmenopausal Women from Panoramic Radiographs.

J Clin Med. 2025 Jun 23;14(13):4462. doi: 10.3390/jcm14134462.

Advancements in the application of MRI radiomics in meningioma.

Radiat Oncol. 2025 Jul 1;20(1):105. doi: 10.1186/s13014-025-02679-8.

Insights into radiomics: a comprehensive review for beginners.

Clin Transl Oncol. 2025 May 12. doi: 10.1007/s12094-025-03939-5.

Research progress of artificial intelligence and machine learning in pulmonary embolism.

Front Med (Lausanne). 2025 Mar 27;12:1577559. doi: 10.3389/fmed.2025.1577559. eCollection 2025.

Radiomics to predict tumor response to combination chemoradiotherapy in squamous cell carcinoma of the anal canal: a preliminary investigation.

Eur Radiol Exp. 2025 Mar 22;9(1):35. doi: 10.1186/s41747-025-00559-0.

Comparison of machine learning and nomogram to predict 30-day in-hospital mortality in patients with acute myocardial infarction combined with cardiogenic shock: a retrospective study based on the eICU-CRD and MIMIC-IV databases.

BMC Cardiovasc Disord. 2025 Mar 19;25(1):197. doi: 10.1186/s12872-025-04628-5.

Clinician-driven automated data preprocessing in nuclear medicine AI environments.

Eur J Nucl Med Mol Imaging. 2025 Mar 7. doi: 10.1007/s00259-025-07183-5.

本文引用的文献

Quality assessment of meningioma radiomics studies: Bridging the gap between exploratory research and clinical applications.

Eur J Radiol. 2021 May;138:109673. doi: 10.1016/j.ejrad.2021.109673. Epub 2021 Mar 20.

External validation of prognostic models: what, why, how, when and where?

Clin Kidney J. 2020 Nov 24;14(1):49-58. doi: 10.1093/ckj/sfaa188. eCollection 2021 Jan.

Extensions of the External Validation for Checking Learned Model Interpretability and Generalizability.

Patterns (N Y). 2020 Nov 13;1(8):100129. doi: 10.1016/j.patter.2020.100129.

Quality Reporting of Radiomics Analysis in Mild Cognitive Impairment and Alzheimer's Disease: A Roadmap for Moving Forward.

Korean J Radiol. 2020 Dec;21(12):1345-1354. doi: 10.3348/kjr.2020.0715. Epub 2020 Oct 30.

Robust performance of deep learning for distinguishing glioblastoma from single brain metastasis using radiomic features: model development and validation.

Sci Rep. 2020 Jul 21;10(1):12110. doi: 10.1038/s41598-020-68980-6.

The Diagnostic Value of Radiomics-Based Machine Learning in Predicting the Grade of Meningiomas Using Conventional Magnetic Resonance Imaging: A Preliminary Study.

Front Oncol. 2019 Dec 6;9:1338. doi: 10.3389/fonc.2019.01338. eCollection 2019.

Machine learning algorithm validation with a limited sample size.

PLoS One. 2019 Nov 7;14(11):e0224365. doi: 10.1371/journal.pone.0224365. eCollection 2019.

Radiomics-Based Machine Learning in Differentiation Between Glioblastoma and Metastatic Brain Tumors.

Front Oncol. 2019 Aug 22;9:806. doi: 10.3389/fonc.2019.00806. eCollection 2019.

Quality of science and reporting of radiomics in oncologic studies: room for improvement according to radiomics quality score and TRIPOD statement.

Eur Radiol. 2020 Jan;30(1):523-536. doi: 10.1007/s00330-019-06360-z. Epub 2019 Jul 26.

Towards clinical application of image mining: a systematic review on artificial intelligence and radiomics.

Eur J Nucl Med Mol Imaging. 2019 Dec;46(13):2656-2672. doi: 10.1007/s00259-019-04372-x. Epub 2019 Jun 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

小样本量的放射组学机器学习研究：单一随机训练-测试集拆分可能导致不可靠的结果。

Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献