• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

外部验证:一项模拟研究,比较交叉验证与留出法或外部测试,以使用弥漫性大B细胞淋巴瘤(DLBCL)患者的PET数据评估临床预测模型的性能。

External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients.

作者信息

Eertink Jakoba J, Heymans Martijn W, Zwezerijnen Gerben J C, Zijlstra Josée M, de Vet Henrica C W, Boellaard Ronald

机构信息

Department of Hematology, Amsterdam UMC Location Vrije Universiteit Amsterdam, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands.

Imaging and Biomarkers, Cancer Center Amsterdam, Amsterdam, The Netherlands.

出版信息

EJNMMI Res. 2022 Sep 11;12(1):58. doi: 10.1186/s13550-022-00931-w.

DOI:10.1186/s13550-022-00931-w
PMID:36089634
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9464671/
Abstract

AIM

Clinical prediction models need to be validated. In this study, we used simulation data to compare various internal and external validation approaches to validate models.

METHODS

Data of 500 patients were simulated using distributions of metabolic tumor volume, standardized uptake value, the maximal distance between the largest lesion and another lesion, WHO performance status and age of 296 diffuse large B cell lymphoma patients. These data were used to predict progression after 2 years based on an existing logistic regression model. Using the simulated data, we applied cross-validation, bootstrapping and holdout (n = 100). We simulated new external datasets (n = 100, n = 200, n = 500) and simulated stage-specific external datasets (1), varied the cut-off for high-risk patients (2) and the false positive and false negative rates (3) and simulated a dataset with EARL2 characteristics (4). All internal and external simulations were repeated 100 times. Model performance was expressed as the cross-validated area under the curve (CV-AUC ± SD) and calibration slope.

RESULTS

The cross-validation (0.71 ± 0.06) and holdout (0.70 ± 0.07) resulted in comparable model performances, but the model had a higher uncertainty using a holdout set. Bootstrapping resulted in a CV-AUC of 0.67 ± 0.02. The calibration slope was comparable for these internal validation approaches. Increasing the size of the test set resulted in more precise CV-AUC estimates and smaller SD for the calibration slope. For test datasets with different stages, the CV-AUC increased as Ann Arbor stages increased. As expected, changing the cut-off for high risk and false positive- and negative rates influenced the model performance, which is clearly shown by the low calibration slope. The EARL2 dataset resulted in similar model performance and precision, but calibration slope indicated overfitting.

CONCLUSION

In case of small datasets, it is not advisable to use a holdout or a very small external dataset with similar characteristics. A single small testing dataset suffers from a large uncertainty. Therefore, repeated CV using the full training dataset is preferred instead. Our simulations also demonstrated that it is important to consider the impact of differences in patient population between training and test data, which may ask for adjustment or stratification of relevant variables.

摘要

目的

临床预测模型需要进行验证。在本研究中,我们使用模拟数据比较各种内部和外部验证方法来验证模型。

方法

利用296例弥漫性大B细胞淋巴瘤患者的代谢肿瘤体积、标准化摄取值、最大病灶与另一病灶之间的最大距离、世界卫生组织体能状态和年龄的分布情况,模拟了500例患者的数据。这些数据用于基于现有的逻辑回归模型预测2年后的病情进展。利用模拟数据,我们应用了交叉验证、自抽样法和留出法(n = 100)。我们模拟了新的外部数据集(n = 100、n = 200、n = 500)和特定阶段的外部数据集(1),改变高危患者的截断值(2)以及假阳性和假阴性率(3),并模拟了具有EARL2特征的数据集(4)。所有内部和外部模拟均重复100次。模型性能以交叉验证曲线下面积(CV-AUC±标准差)和校准斜率表示。

结果

交叉验证(0.71±0.06)和留出法(0.70±0.07)得出的模型性能相当,但使用留出集时模型的不确定性更高。自抽样法得出的CV-AUC为0.67±0.02。这些内部验证方法的校准斜率相当。增加测试集的大小会导致CV-AUC估计更精确,校准斜率的标准差更小。对于不同阶段的测试数据集,CV-AUC随着Ann Arbor分期的增加而增加。正如预期的那样,改变高危截断值以及假阳性和假阴性率会影响模型性能,校准斜率较低清楚地表明了这一点。EARL2数据集得出的模型性能和精度相似,但校准斜率表明存在过拟合。

结论

在数据集较小的情况下,不建议使用留出法或具有相似特征的非常小的外部数据集。单个小测试数据集存在很大的不确定性。因此,最好使用完整训练数据集重复进行交叉验证。我们的模拟还表明,考虑训练数据和测试数据之间患者群体差异的影响很重要,这可能需要对相关变量进行调整或分层。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7cc/9464671/a3193290078e/13550_2022_931_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7cc/9464671/3afd07d53a89/13550_2022_931_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7cc/9464671/f530d8254d7a/13550_2022_931_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7cc/9464671/a3193290078e/13550_2022_931_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7cc/9464671/3afd07d53a89/13550_2022_931_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7cc/9464671/f530d8254d7a/13550_2022_931_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7cc/9464671/a3193290078e/13550_2022_931_Fig3_HTML.jpg

相似文献

1
External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients.外部验证:一项模拟研究,比较交叉验证与留出法或外部测试,以使用弥漫性大B细胞淋巴瘤(DLBCL)患者的PET数据评估临床预测模型的性能。
EJNMMI Res. 2022 Sep 11;12(1):58. doi: 10.1186/s13550-022-00931-w.
2
Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models.信息论特征选择和机器学习方法在遗传风险预测模型开发中的应用。
Sci Rep. 2021 Dec 2;11(1):23335. doi: 10.1038/s41598-021-00854-x.
3
Does the SORG Algorithm Predict 5-year Survival in Patients with Chondrosarcoma? An External Validation.SORG 算法能否预测软骨肉瘤患者的 5 年生存率?一项外部验证。
Clin Orthop Relat Res. 2019 Oct;477(10):2296-2303. doi: 10.1097/CORR.0000000000000748.
4
Empirical evaluation of internal validation methods for prediction in large-scale clinical data with rare-event outcomes: a case study in suicide risk prediction.大规模临床稀有事件结局数据预测中内部验证方法的实证评估:以自杀风险预测为例
BMC Med Res Methodol. 2023 Feb 1;23(1):33. doi: 10.1186/s12874-023-01844-5.
5
Validation of an Artificial Intelligence-Based Prediction Model Using 5 External PET/CT Datasets of Diffuse Large B-Cell Lymphoma.基于人工智能的弥漫性大 B 细胞淋巴瘤 5 个外部 PET/CT 数据集的验证模型。
J Nucl Med. 2024 Nov 1;65(11):1802-1807. doi: 10.2967/jnumed.124.268191.
6
Stratification bias in low signal microarray studies.低信号微阵列研究中的分层偏差。
BMC Bioinformatics. 2007 Sep 2;8:326. doi: 10.1186/1471-2105-8-326.
7
Prediction of Neurological Outcomes in Out-of-hospital Cardiac Arrest Survivors Immediately after Return of Spontaneous Circulation: Ensemble Technique with Four Machine Learning Models.院外心脏骤停幸存者自主循环恢复后即刻的神经功能结局预测:四种机器学习模型的集成技术。
J Korean Med Sci. 2021 Jul 19;36(28):e187. doi: 10.3346/jkms.2021.36.e187.
8
Validation and development of models using clinical, biochemical and ultrasound markers for predicting pre-eclampsia: an individual participant data meta-analysis.利用临床、生化和超声标志物预测子痫前期的模型的验证和建立:一项个体参与者数据荟萃分析。
Health Technol Assess. 2020 Dec;24(72):1-252. doi: 10.3310/hta24720.
9
Temporal validation of a multivariable surgical mortality prediction model (NZRisk): a New Zealand national cohort study.多变量手术死亡率预测模型(NZRisk)的时间验证:新西兰全国队列研究。
BMJ Open. 2023 Mar 30;13(3):e069911. doi: 10.1136/bmjopen-2022-069911.
10
A New Technique for Evaluating Land-use Regression Models and Their Impact on Health Effect Estimates.一种评估土地利用回归模型及其对健康效应估计影响的新技术。
Epidemiology. 2016 Jan;27(1):51-6. doi: 10.1097/EDE.0000000000000404.

引用本文的文献

1
Diagnostic Value of F-FDG PET/CT Radiomics in Lymphoma: A Systematic Review and Meta-Analysis.F-FDG PET/CT 影像组学在淋巴瘤中的诊断价值:一项系统评价与Meta分析
Technol Cancer Res Treat. 2025 Jan-Dec;24:15330338251342860. doi: 10.1177/15330338251342860. Epub 2025 May 21.
2
An ensemble deep learning framework for emotion recognition through wearable devices multi-modal physiological signals.一种通过可穿戴设备多模态生理信号进行情感识别的集成深度学习框架。
Sci Rep. 2025 May 18;15(1):17263. doi: 10.1038/s41598-025-99858-0.
3
Machine learning in physical activity, sedentary, and sleep behavior research.

本文引用的文献

1
Generation and validation of a PET radiomics model that predicts survival in diffuse large B cell lymphoma treated with R-CHOP14: A SAKK 38/07 trial post-hoc analysis.基于 R-CHOP14 方案治疗弥漫性大 B 细胞淋巴瘤的 PET 影像组学模型的构建与验证:SAKK 38/07 试验的事后分析。
Hematol Oncol. 2022 Feb;40(1):11-21. doi: 10.1002/hon.2935. Epub 2021 Oct 29.
2
F-FDG PET baseline radiomics features improve the prediction of treatment outcome in diffuse large B-cell lymphoma.18F-FDG PET 基线影像组学特征可提高弥漫性大 B 细胞淋巴瘤治疗效果的预测。
Eur J Nucl Med Mol Imaging. 2022 Feb;49(3):932-942. doi: 10.1007/s00259-021-05480-3. Epub 2021 Aug 18.
3
体育活动、久坐行为和睡眠行为研究中的机器学习
J Act Sedentary Sleep Behav. 2024 Jan 30;3(1):5. doi: 10.1186/s44167-024-00045-9.
4
Diagnostic performance of radiomics models for preoperative prediction of microsatellite instability status in endometrial cancer: a systematic review and meta-analysis.放射组学模型对子宫内膜癌微卫星不稳定性状态术前预测的诊断性能:一项系统评价和荟萃分析
Abdom Radiol (NY). 2025 Apr 8. doi: 10.1007/s00261-025-04933-9.
5
Pre-trained convolutional neural networks identify Parkinson's disease from spectrogram images of voice samples.预训练卷积神经网络从语音样本的频谱图图像中识别帕金森病。
Sci Rep. 2025 Mar 1;15(1):7337. doi: 10.1038/s41598-025-92105-6.
6
Discriminative ability, responsiveness, and interpretability of smoothness index of gait in people with multiple sclerosis.多发性硬化症患者步态平滑度指数的辨别能力、反应性和可解释性。
Arch Physiother. 2025 Feb 3;15:9-18. doi: 10.33393/aop.2025.3289. eCollection 2025 Jan-Dec.
7
Automatic Recognition of Motor Skills in Triathlon: A Novel Tool for Measuring Movement Cadence and Cycling Tasks.铁人三项运动中运动技能的自动识别:一种测量运动节奏和自行车运动任务的新型工具。
J Funct Morphol Kinesiol. 2024 Dec 12;9(4):269. doi: 10.3390/jfmk9040269.
8
The Aachen ACLF ICU score predicts ICU mortality in critically ill patients with acute-on-chronic liver failure.亚琛急性慢性肝衰竭重症监护病房评分可预测急性慢性肝衰竭重症患者的重症监护病房死亡率。
Sci Rep. 2024 Dec 16;14(1):30497. doi: 10.1038/s41598-024-82178-0.
9
Predictive Model Building for Aggregation Kinetics Based on Molecular Dynamics Simulations of an Antibody Fragment.基于抗体片段分子动力学模拟的聚集动力学预测模型构建。
Mol Pharm. 2024 Nov 4;21(11):5827-5841. doi: 10.1021/acs.molpharmaceut.4c00859. Epub 2024 Sep 30.
10
Prediction of Cell Survival Rate Based on Physical Characteristics of Heavy Ion Radiation.基于重离子辐射物理特性的细胞存活率预测
Toxics. 2024 Jul 27;12(8):545. doi: 10.3390/toxics12080545.
[F]FDG PET radiomics to predict disease-free survival in cervical cancer: a multi-scanner/center study with external validation.
[F]FDG PET 放射组学预测宫颈癌无病生存:多扫描仪/中心研究及外部验证。
Eur J Nucl Med Mol Imaging. 2021 Oct;48(11):3432-3443. doi: 10.1007/s00259-021-05303-5. Epub 2021 Mar 26.
4
Re-evaluation of the comparative effectiveness of bootstrap-based optimism correction methods in the development of multivariable clinical prediction models.基于 Bootstrap 的校正方法在多变量临床预测模型构建中的校正效能再评价。
BMC Med Res Methodol. 2021 Jan 7;21(1):9. doi: 10.1186/s12874-020-01201-w.
5
Predictive value of quantitative F-FDG-PET radiomics analysis in patients with head and neck squamous cell carcinoma.定量F-FDG-PET放射组学分析在头颈部鳞状细胞癌患者中的预测价值。
EJNMMI Res. 2020 Sep 7;10(1):102. doi: 10.1186/s13550-020-00686-2.
6
Rituximab-CHOP With Early Rituximab Intensification for Diffuse Large B-Cell Lymphoma: A Randomized Phase III Trial of the HOVON and the Nordic Lymphoma Group (HOVON-84).利妥昔单抗-CHOP 联合早期利妥昔单抗强化治疗弥漫性大 B 细胞淋巴瘤:HOVON 和北欧淋巴瘤组(HOVON-84)的一项随机 III 期试验。
J Clin Oncol. 2020 Oct 10;38(29):3377-3387. doi: 10.1200/JCO.19.03418. Epub 2020 Jul 30.
7
Introduction to Radiomics.放射组学简介。
J Nucl Med. 2020 Apr;61(4):488-495. doi: 10.2967/jnumed.118.222893. Epub 2020 Feb 14.
8
Quantitative implications of the updated EARL 2019 PET-CT performance standards.2019年EARL更新的PET-CT性能标准的定量影响。
EJNMMI Phys. 2019 Dec 26;6(1):28. doi: 10.1186/s40658-019-0257-8.
9
Prediction models need appropriate internal, internal-external, and external validation.预测模型需要进行适当的内部验证、内部-外部联合验证以及外部验证。
J Clin Epidemiol. 2016 Jan;69:245-7. doi: 10.1016/j.jclinepi.2015.04.005. Epub 2015 Apr 18.
10
Correcting for optimistic prediction in small data sets.在小数据集上纠正乐观预测。
Am J Epidemiol. 2014 Aug 1;180(3):318-24. doi: 10.1093/aje/kwu140. Epub 2014 Jun 24.