• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估基于树的集成机器学习技术在临床风险预测中的样本量要求。

Evaluating the sample size requirements of tree-based ensemble machine learning techniques for clinical risk prediction.

作者信息

Kalaycıoğlu Oya, Pavlou Menelaos, Akhanlı Serhat E, de Belder Mark A, Ambler Gareth, Omar Rumana Z

机构信息

Department of Biostatistics and Medical Informatics, Bolu Abant İzzet Baysal University, Bolu, Türkiye.

Department of Statistical Science, University College London, London, UK.

出版信息

Stat Methods Med Res. 2025 Jul;34(7):1356-1372. doi: 10.1177/09622802251338983. Epub 2025 May 14.

DOI:10.1177/09622802251338983
PMID:40368385
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12308042/
Abstract

Machine learning techniques (MLTs) are increasingly being used to develop clinical risk prediction models for binary health outcomes but the sample size requirements for developing and validating such models remain unclear. This study investigates whether sample size guidelines that target mean absolute prediction error (MAPE) for logistic regression models can be applied to tree-based ensemble MLTs (bagging, random forests, and boosting). Simulations based on two large cardiovascular datasets were used to evaluate the performance of MLTs in terms of MAPE, calibration, the -statistic and Brier score, across six data-generating mechanisms (DGMs) and varying sample sizes. When the DGM and analysis model matched, boosting required a sample size 2-3 times larger than recommended; random forests and bagging did not achieve the target MAPE even with a 12-fold increase. For a neutral DGM that did not match any of the analysis models, logistic regression with only main effects and boosting resulted in target MAPE values with a 12-fold increase in the recommended sample size. For external validation, our simulations showed that sample size guidelines to achieve a target precision of the estimated -statistic were suitable, and thus may be used to inform sample size calculations for MLTs.

摘要

机器学习技术(MLTs)越来越多地被用于开发针对二元健康结局的临床风险预测模型,但开发和验证此类模型所需的样本量仍不明确。本研究调查了针对逻辑回归模型的平均绝对预测误差(MAPE)的样本量指南是否可应用于基于树的集成MLTs(装袋法、随机森林和提升法)。基于两个大型心血管数据集的模拟被用于评估MLTs在六种数据生成机制(DGMs)和不同样本量下,在MAPE、校准、-统计量和Brier评分方面的性能。当DGM与分析模型匹配时,提升法所需的样本量比推荐值大2至3倍;即使样本量增加12倍,随机森林和装袋法也未达到目标MAPE。对于与任何分析模型都不匹配的中性DGM,仅包含主效应的逻辑回归和提升法在推荐样本量增加12倍的情况下实现了目标MAPE值。对于外部验证,我们的模拟表明,实现估计的-统计量目标精度的样本量指南是合适的,因此可用于指导MLTs的样本量计算。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e4a/12308042/0b34ccf695c6/10.1177_09622802251338983-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e4a/12308042/00fcefaf83e9/10.1177_09622802251338983-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e4a/12308042/0b34ccf695c6/10.1177_09622802251338983-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e4a/12308042/00fcefaf83e9/10.1177_09622802251338983-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e4a/12308042/0b34ccf695c6/10.1177_09622802251338983-fig2.jpg

相似文献

1
Evaluating the sample size requirements of tree-based ensemble machine learning techniques for clinical risk prediction.评估基于树的集成机器学习技术在临床风险预测中的样本量要求。
Stat Methods Med Res. 2025 Jul;34(7):1356-1372. doi: 10.1177/09622802251338983. Epub 2025 May 14.
2
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
3
Machine Learning Did Not Outperform Conventional Competing Risk Modeling to Predict Revision Arthroplasty.在预测翻修关节成形术方面,机器学习的表现并未优于传统的竞争风险模型。
Clin Orthop Relat Res. 2024 Aug 1;482(8):1472-1482. doi: 10.1097/CORR.0000000000003018. Epub 2024 Mar 12.
4
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.
5
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
6
How to use learning curves to evaluate the sample size for malaria prediction models developed using machine learning algorithms.如何使用学习曲线评估利用机器学习算法开发的疟疾预测模型的样本量。
Malar J. 2025 Jul 24;24(1):242. doi: 10.1186/s12936-025-05479-3.
7
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。
Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.
8
Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究
Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.
9
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型:基于多中心队列研究的开发与验证研究
J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.
10
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

本文引用的文献

1
An evaluation of sample size requirements for developing risk prediction models with binary outcomes.评估二分类结局风险预测模型的样本量需求。
BMC Med Res Methodol. 2024 Jul 10;24(1):146. doi: 10.1186/s12874-024-02268-5.
2
Practical guide to building machine learning-based clinical prediction models using imbalanced datasets.使用不均衡数据集构建基于机器学习的临床预测模型实用指南。
Trauma Surg Acute Care Open. 2024 Jun 12;9(1):e001222. doi: 10.1136/tsaco-2023-001222. eCollection 2024.
3
Sample size and predictive performance of machine learning methods with survival data: A simulation study.
基于生存数据的机器学习方法的样本量和预测性能:一项模拟研究。
Stat Med. 2023 Dec 30;42(30):5657-5675. doi: 10.1002/sim.9931. Epub 2023 Nov 10.
4
Construction and validation of machine learning models for sepsis prediction in patients with acute pancreatitis.构建并验证用于预测急性胰腺炎患者脓毒症的机器学习模型。
BMC Surg. 2023 Sep 1;23(1):267. doi: 10.1186/s12893-023-02151-y.
5
Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review.研究开发二分类结局预测模型时未考虑样本量要求:系统评价。
BMC Med Res Methodol. 2023 Aug 19;23(1):188. doi: 10.1186/s12874-023-02008-1.
6
Risk Prediction for the Development of Hyperuricemia: Model Development Using an Occupational Health Examination Dataset.高尿酸血症发展风险预测:基于职业健康体检数据集的模型构建。
Int J Environ Res Public Health. 2023 Feb 15;20(4):3411. doi: 10.3390/ijerph20043411.
7
Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models.系统评价确定了基于机器学习的预测模型研究的设计和方法实施情况。
J Clin Epidemiol. 2023 Feb;154:8-22. doi: 10.1016/j.jclinepi.2022.11.015. Epub 2022 Nov 25.
8
Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review.基于机器学习的肿瘤预后预测模型的方法学研究:系统评价。
BMC Med Res Methodol. 2022 Apr 8;22(1):101. doi: 10.1186/s12874-022-01577-x.
9
Machine learning versus logistic regression for prognostic modelling in individuals with non-specific neck pain.机器学习与逻辑回归在非特异性颈痛患者预后模型中的比较。
Eur Spine J. 2022 Aug;31(8):2082-2091. doi: 10.1007/s00586-022-07188-w. Epub 2022 Mar 30.
10
Comparison of Machine Learning Techniques for Mortality Prediction in a Prospective Cohort of Older Adults.机器学习技术在老年前瞻性队列中死亡率预测的比较。
Int J Environ Res Public Health. 2021 Dec 4;18(23):12806. doi: 10.3390/ijerph182312806.