• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

表格临床数据中常用分类算法的样本量要求:实证研究

Sample Size Requirements for Popular Classification Algorithms in Tabular Clinical Data: Empirical Study.

作者信息

Silvey Scott, Liu Jinze

机构信息

Department of Biostatistics, School of Public Health, Virginia Commonwealth University, Richmond, VA, United States.

出版信息

J Med Internet Res. 2024 Dec 17;26:e60231. doi: 10.2196/60231.

DOI:10.2196/60231
PMID:39689306
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11688588/
Abstract

BACKGROUND

The performance of a classification algorithm eventually reaches a point of diminishing returns, where the additional sample added does not improve the results. Thus, there is a need to determine an optimal sample size that maximizes performance while accounting for computational burden or budgetary concerns.

OBJECTIVE

This study aimed to determine optimal sample sizes and the relationships between sample size and dataset-level characteristics over a variety of binary classification algorithms.

METHODS

A total of 16 large open-source datasets were collected, each containing a binary clinical outcome. Furthermore, 4 machine learning algorithms were assessed: XGBoost (XGB), random forest (RF), logistic regression (LR), and neural networks (NNs). For each dataset, the cross-validated area under the curve (AUC) was calculated at increasing sample sizes, and learning curves were fit. Sample sizes needed to reach the observed full-dataset AUC minus 2 points (0.02) were calculated from the fitted learning curves and compared across the datasets and algorithms. Dataset-level characteristics, minority class proportion, full-dataset AUC, number of features, type of features, and degree of nonlinearity were examined. Negative binomial regression models were used to quantify relationships between these characteristics and expected sample sizes within each algorithm. A total of 4 multivariable models were constructed, which selected the best-fitting combination of dataset-level characteristics.

RESULTS

Among the 16 datasets (full-dataset sample sizes ranging from 70,000-1,000,000), median sample sizes were 9960 (XGB), 3404 (RF), 696 (LR), and 12,298 (NN) to reach AUC stability. For all 4 algorithms, more balanced classes (multiplier: 0.93-0.96 for a 1% increase in minority class proportion) were associated with decreased sample size. Other characteristics varied in importance across algorithms-in general, more features, weaker features, and more complex relationships between the predictors and the response increased expected sample sizes. In multivariable analysis, the top selected predictors were minority class proportion among all 4 algorithms assessed, full-dataset AUC (XGB, RF, and NN), and dataset nonlinearity (XGB, RF, and NN). For LR, the top predictors were minority class proportion, percentage of strong linear features, and number of features. Final multivariable sample size models had high goodness-of-fit, with dataset-level predictors explaining a majority (66.5%-84.5%) of the total deviance in the data among all 4 models.

CONCLUSIONS

The sample sizes needed to reach AUC stability among 4 popular classification algorithms vary by dataset and method and are associated with dataset-level characteristics that can be influenced or estimated before the start of a research study.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/cba274f2e212/jmir_v26i1e60231_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/ee6deb130aa0/jmir_v26i1e60231_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/af8ca6a51adf/jmir_v26i1e60231_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/814188e49e85/jmir_v26i1e60231_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/cc93d2c4d8d7/jmir_v26i1e60231_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/cba274f2e212/jmir_v26i1e60231_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/ee6deb130aa0/jmir_v26i1e60231_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/af8ca6a51adf/jmir_v26i1e60231_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/814188e49e85/jmir_v26i1e60231_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/cc93d2c4d8d7/jmir_v26i1e60231_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b36c/11688588/cba274f2e212/jmir_v26i1e60231_fig5.jpg
摘要

背景

分类算法的性能最终会达到收益递减点,即增加的样本并不会改善结果。因此,需要确定一个最优样本量,在考虑计算负担或预算问题的同时使性能最大化。

目的

本研究旨在确定多种二元分类算法的最优样本量以及样本量与数据集层面特征之间的关系。

方法

共收集了16个大型开源数据集,每个数据集都包含一个二元临床结局。此外,评估了4种机器学习算法:极端梯度提升(XGBoost,XGB)、随机森林(RF)、逻辑回归(LR)和神经网络(NNs)。对于每个数据集,在样本量增加时计算交叉验证曲线下面积(AUC),并拟合学习曲线。根据拟合的学习曲线计算达到观察到的完整数据集AUC减去2个百分点(0.02)所需的样本量,并在各数据集和算法之间进行比较。研究了数据集层面的特征、少数类比例、完整数据集AUC、特征数量、特征类型和非线性程度。使用负二项回归模型来量化这些特征与每种算法内预期样本量之间的关系。共构建了4个多变量模型,这些模型选择了数据集层面特征的最佳拟合组合。

结果

在16个数据集中(完整数据集样本量范围为70,000 - 1,000,000),达到AUC稳定性的样本量中位数分别为:XGB为9960、RF为3404、LR为696、NNs为12,298。对于所有4种算法,更平衡的类别(少数类比例每增加1%,乘数为0.93 - 0.96)与样本量减少相关。其他特征在不同算法中的重要性各不相同——一般来说,更多特征(原文此处有误,应为更强特征)、较弱特征以及预测变量与响应之间更复杂的关系会增加预期样本量。在多变量分析中,所有4种评估算法中最主要的预测因素是少数类比例,对于XGB、RF和NNs是完整数据集AUC,对于XGB、RF和NNs是数据集非线性。对于LR,主要预测因素是少数类比例、强线性特征的百分比和特征数量。最终的多变量样本量模型具有良好的拟合度,在所有4个模型中,数据集层面的预测因素解释了数据中总偏差的大部分(66.5% - 84.5%)。

结论

4种常用分类算法达到AUC稳定性所需的样本量因数据集和方法而异,并且与数据集层面的特征相关,这些特征在研究开始前可以受到影响或进行估计。

相似文献

1
Sample Size Requirements for Popular Classification Algorithms in Tabular Clinical Data: Empirical Study.表格临床数据中常用分类算法的样本量要求:实证研究
J Med Internet Res. 2024 Dec 17;26:e60231. doi: 10.2196/60231.
2
Energy Efficiency of Inference Algorithms for Clinical Laboratory Data Sets: Green Artificial Intelligence Study.临床实验室数据集推断算法的能效:绿色人工智能研究。
J Med Internet Res. 2022 Jan 25;24(1):e28036. doi: 10.2196/28036.
3
[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].基于监督机器学习算法构建脓毒症休克患者死亡风险预测模型
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2024 Apr;36(4):345-352. doi: 10.3760/cma.j.cn121430-20230930-00832.
4
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
5
Predictive etiological classification of acute ischemic stroke through interpretable machine learning algorithms: a multicenter, prospective cohort study.通过可解释的机器学习算法对急性缺血性脑卒中进行预测病因分类:一项多中心前瞻性队列研究。
BMC Med Res Methodol. 2024 Sep 10;24(1):199. doi: 10.1186/s12874-024-02331-1.
6
Prediction and feature selection of low birth weight using machine learning algorithms.利用机器学习算法预测和选择低出生体重。
J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.
7
Comparing machine learning algorithms to predict COVID‑19 mortality using a dataset including chest computed tomography severity score data.比较机器学习算法,使用包含胸部计算机断层扫描严重程度评分数据的数据集来预测 COVID-19 死亡率。
Sci Rep. 2023 Jul 13;13(1):11343. doi: 10.1038/s41598-023-38133-6.
8
Identifying determinants of malnutrition in under-five children in Bangladesh: insights from the BDHS-2022 cross-sectional study.确定孟加拉国五岁以下儿童营养不良的决定因素:来自2022年孟加拉国人口与健康调查横断面研究的见解
Sci Rep. 2025 Apr 24;15(1):14336. doi: 10.1038/s41598-025-99288-y.
9
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
10
Application of machine learning algorithms to identify people with low bone density.机器学习算法在识别低骨密度人群中的应用。
Front Public Health. 2024 Apr 25;12:1347219. doi: 10.3389/fpubh.2024.1347219. eCollection 2024.

引用本文的文献

1
Development of a machine learning-based prediction model for serious bacterial infections in febrile young infants.基于机器学习的发热小婴儿严重细菌感染预测模型的开发。
BMJ Paediatr Open. 2025 Jul 30;9(1):e003548. doi: 10.1136/bmjpo-2025-003548.
2
Towards universal early screening for cerebral palsy: a roadmap for automated General Movements Assessment.迈向脑瘫的普遍早期筛查:自动全身运动评估路线图。
EClinicalMedicine. 2025 Jul 22;86:103379. doi: 10.1016/j.eclinm.2025.103379. eCollection 2025 Aug.
3
How to use learning curves to evaluate the sample size for malaria prediction models developed using machine learning algorithms.

本文引用的文献

1
Machine Learning and Health Science Research: Tutorial.机器学习与健康科学研究:教程。
J Med Internet Res. 2024 Jan 30;26:e50890. doi: 10.2196/50890.
2
Development and Validation of a Machine Learning Prediction Model of Posttraumatic Stress Disorder After Military Deployment.军事部署后创伤后应激障碍的机器学习预测模型的开发和验证。
JAMA Netw Open. 2023 Jun 1;6(6):e2321273. doi: 10.1001/jamanetworkopen.2023.21273.
3
A Prehospital Triage System to Detect Traumatic Intracranial Hemorrhage Using Machine Learning Algorithms.
如何使用学习曲线评估利用机器学习算法开发的疟疾预测模型的样本量。
Malar J. 2025 Jul 24;24(1):242. doi: 10.1186/s12936-025-05479-3.
4
Interpretable machine learning for depression recognition with spatiotemporal gait features among older adults: a cross-sectional study in Xiamen, China.基于时空步态特征的老年人抑郁症识别可解释机器学习:中国厦门的一项横断面研究
BMC Geriatr. 2025 Jul 2;25(1):453. doi: 10.1186/s12877-025-06101-6.
5
Optimal Machine Learning Models for Developing Prognostic Predictions in Patients With Advanced Cancer.用于制定晚期癌症患者预后预测的最佳机器学习模型。
Cureus. 2024 Dec 22;16(12):e76227. doi: 10.7759/cureus.76227. eCollection 2024 Dec.
基于机器学习算法的创伤性颅内出血院前分诊系统。
JAMA Netw Open. 2022 Jun 1;5(6):e2216393. doi: 10.1001/jamanetworkopen.2022.16393.
4
Potential applications and performance of machine learning techniques and algorithms in clinical practice: A systematic review.机器学习技术和算法在临床实践中的潜在应用和性能:系统评价。
Int J Med Inform. 2022 Mar;159:104679. doi: 10.1016/j.ijmedinf.2021.104679. Epub 2021 Dec 31.
5
Digital medicine and the curse of dimensionality.数字医学与维度诅咒
NPJ Digit Med. 2021 Oct 28;4(1):153. doi: 10.1038/s41746-021-00521-5.
6
A tutorial on calibration measurements and calibration models for clinical prediction models.临床预测模型的校准测量和校准模型教程。
J Am Med Inform Assoc. 2020 Apr 1;27(4):621-633. doi: 10.1093/jamia/ocz228.
7
Artificial intelligence in healthcare: past, present and future.人工智能在医疗保健中的应用:过去、现在和未来。
Stroke Vasc Neurol. 2017 Jun 21;2(4):230-243. doi: 10.1136/svn-2017-000101. eCollection 2017 Dec.
8
Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints.现代建模技术对数据需求极大:一项用于预测二分结局的模拟研究。
BMC Med Res Methodol. 2014 Dec 22;14:137. doi: 10.1186/1471-2288-14-137.
9
Predicting sample size required for classification performance.预测分类性能所需的样本量。
BMC Med Inform Decis Mak. 2012 Feb 15;12:8. doi: 10.1186/1472-6947-12-8.
10
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.