Suppr超能文献

在肿瘤学中使用机器学习开发临床预测模型时需要更大的样本量:方法学系统评价

Larger sample sizes are needed when developing a clinical prediction model using machine learning in oncology: methodological systematic review.

作者信息

Tsegaye Biruk, Snell Kym I E, Archer Lucinda, Kirtley Shona, Riley Richard D, Sperrin Matthew, Van Calster Ben, Collins Gary S, Dhiman Paula

机构信息

Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK.

Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, UK; Institute of Translational Medicine, National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK.

出版信息

J Clin Epidemiol. 2025 Apr;180:111675. doi: 10.1016/j.jclinepi.2025.111675. Epub 2025 Jan 13.

Abstract

BACKGROUND AND OBJECTIVES

Having a sufficient sample size is crucial when developing a clinical prediction model. We reviewed details of sample size in studies developing prediction models for binary outcomes using machine learning (ML) methods within oncology and compared the sample size used to develop the models with the minimum required sample size needed when developing a regression-based model (N).

METHODS

We searched the Medline (via OVID) database for studies developing a prediction model using ML methods published in December 2022. We reviewed how sample size was justified. We calculated N, which is the N, and compared this with the sample size that was used to develop the models.

RESULTS

Only one of 36 included studies justified their sample size. We were able to calculate N for 17 (47%) studies. 5/17 studies met N, allowing to precisely estimate the overall risk and minimize overfitting. There was a median deficit of 302 participants with the event (n = 17; range: -21,331 to 2298) when developing the ML models. An additional three out of the 17 studies met the required sample size to precisely estimate the overall risk only.

CONCLUSION

Studies developing a prediction model using ML in oncology seldom justified their sample size and sample sizes were often smaller than N. As ML models almost certainly require a larger sample size than regression models, the deficit is likely larger. We recommend that researchers consider and report their sample size and at least meet the minimum sample size required when developing a regression-based model.

摘要

背景与目的

在开发临床预测模型时,拥有足够的样本量至关重要。我们回顾了肿瘤学领域中使用机器学习(ML)方法开发二元结局预测模型的研究中的样本量细节,并将用于开发模型的样本量与开发基于回归的模型时所需的最小样本量(N)进行了比较。

方法

我们在Medline(通过OVID)数据库中搜索了2022年12月发表的使用ML方法开发预测模型的研究。我们回顾了样本量是如何确定合理的。我们计算了N,并将其与用于开发模型的样本量进行了比较。

结果

在纳入的36项研究中,只有1项对其样本量进行了合理性论证。我们能够计算17项(47%)研究的N。17项研究中有5项达到了N,从而能够精确估计总体风险并将过拟合降至最低。在开发ML模型时,事件参与者的中位数短缺为302人(n = 17;范围:-21,331至2298)。17项研究中另外有3项仅达到了精确估计总体风险所需的样本量。

结论

在肿瘤学领域中使用ML开发预测模型的研究很少对其样本量进行合理性论证,并且样本量通常小于N。由于ML模型几乎肯定比回归模型需要更大的样本量,因此短缺可能更大。我们建议研究人员考虑并报告其样本量,并且至少要达到开发基于回归的模型时所需的最小样本量。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验