• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器学习应用中确定样本量的评估。

Evaluation of a decided sample size in machine learning applications.

机构信息

Institute of Cognitive Neuroscience, National Central University, Zhongda Rd, No. 300, Zhongli District, Taoyuan City, 320317, Taiwan, ROC.

Taiwan International Graduate Program in Interdisciplinary Neuroscience, National Central University and Academia Sinica, Taipei, Taiwan, ROC.

出版信息

BMC Bioinformatics. 2023 Feb 14;24(1):48. doi: 10.1186/s12859-023-05156-9.

DOI:10.1186/s12859-023-05156-9
PMID:36788550
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9926644/
Abstract

BACKGROUND

An appropriate sample size is essential for obtaining a precise and reliable outcome of a study. In machine learning (ML), studies with inadequate samples suffer from overfitting of data and have a lower probability of producing true effects, while the increment in sample size increases the accuracy of prediction but may not cause a significant change after a certain sample size. Existing statistical approaches using standardized mean difference, effect size, and statistical power for determining sample size are potentially biased due to miscalculations or lack of experimental details. This study aims to design criteria for evaluating sample size in ML studies. We examined the average and grand effect sizes and the performance of five ML methods using simulated datasets and three real datasets to derive the criteria for sample size. We systematically increase the sample size, starting from 16, by randomly sampling and examine the impact of sample size on classifiers' performance and both effect sizes. Tenfold cross-validation was used to quantify the accuracy.

RESULTS

The results demonstrate that the effect sizes and the classification accuracies increase while the variances in effect sizes shrink with the increment of samples when the datasets have a good discriminative power between two classes. By contrast, indeterminate datasets had poor effect sizes and classification accuracies, which did not improve by increasing sample size in both simulated and real datasets. A good dataset exhibited a significant difference in average and grand effect sizes. We derived two criteria based on the above findings to assess a decided sample size by combining the effect size and the ML accuracy. The sample size is considered suitable when it has appropriate effect sizes (≥ 0.5) and ML accuracy (≥ 80%). After an appropriate sample size, the increment in samples will not benefit as it will not significantly change the effect size and accuracy, thereby resulting in a good cost-benefit ratio.

CONCLUSION

We believe that these practical criteria can be used as a reference for both the authors and editors to evaluate whether the selected sample size is adequate for a study.

摘要

背景

适当的样本量对于获得研究结果的精确性和可靠性至关重要。在机器学习(ML)中,样本量不足的研究容易出现数据过拟合,产生真实效果的概率较低,而增加样本量则会提高预测的准确性,但在一定样本量之后可能不会产生显著变化。现有的使用标准化平均差、效果大小和统计功效来确定样本量的统计方法可能会因为计算错误或缺乏实验细节而存在偏差。本研究旨在设计用于评估 ML 研究中样本量的标准。我们使用模拟数据集和三个真实数据集来检验平均效果大小和总体效果大小以及五种 ML 方法的性能,从而得出样本量标准。我们系统地从 16 开始随机增加样本量,研究样本量对分类器性能和两种效果大小的影响。十折交叉验证用于量化准确性。

结果

结果表明,当数据集在两类之间具有良好的区分能力时,随着样本量的增加,效果大小和分类准确率会增加,而效果大小的方差会缩小。相比之下,不确定数据集的效果大小和分类准确率较差,在模拟和真实数据集两种情况下,增加样本量都无法提高效果大小和分类准确率。良好的数据集表现出显著的平均效果大小和总体效果大小差异。我们根据上述发现,结合效果大小和 ML 准确性,得出了两个评估决定样本量的标准。当样本量具有适当的效果大小(≥0.5)和 ML 准确性(≥80%)时,认为样本量是合适的。在适当的样本量之后,增加样本量不会带来好处,因为它不会显著改变效果大小和准确性,从而实现良好的成本效益比。

结论

我们相信这些实用标准可以为作者和编辑提供参考,以评估所选样本量是否适合研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/05b0f0f9d3a1/12859_2023_5156_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/0f02b63e51f7/12859_2023_5156_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/334eda9c185d/12859_2023_5156_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/c5c3a5b98a56/12859_2023_5156_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/749fe735e1ec/12859_2023_5156_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/05b0f0f9d3a1/12859_2023_5156_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/0f02b63e51f7/12859_2023_5156_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/334eda9c185d/12859_2023_5156_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/c5c3a5b98a56/12859_2023_5156_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/749fe735e1ec/12859_2023_5156_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d006/9926644/05b0f0f9d3a1/12859_2023_5156_Fig5_HTML.jpg

相似文献

1
Evaluation of a decided sample size in machine learning applications.机器学习应用中确定样本量的评估。
BMC Bioinformatics. 2023 Feb 14;24(1):48. doi: 10.1186/s12859-023-05156-9.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Machine Learning Model Validation for Early Stage Studies with Small Sample Sizes.机器学习模型在小样本量早期研究中的验证。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2314-2319. doi: 10.1109/EMBC46164.2021.9629697.
4
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
5
Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting.迈向语音、语言和听力科学中的通用机器学习模型:估计样本量并减少过拟合。
J Speech Lang Hear Res. 2024 Mar 11;67(3):753-781. doi: 10.1044/2023_JSLHR-23-00273. Epub 2024 Feb 22.
6
Machine learning algorithm validation with a limited sample size.机器学习算法在有限样本量下的验证。
PLoS One. 2019 Nov 7;14(11):e0224365. doi: 10.1371/journal.pone.0224365. eCollection 2019.
7
Predicting dropout from psychological treatment using different machine learning algorithms, resampling methods, and sample sizes.使用不同的机器学习算法、重采样方法和样本大小预测心理治疗中的脱落。
Psychother Res. 2023 Jul;33(6):683-695. doi: 10.1080/10503307.2022.2161432. Epub 2023 Jan 20.
8
The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features.机器学习回归算法和样本量对功能连接特征的个体化行为预测的影响。
Neuroimage. 2018 Sep;178:622-637. doi: 10.1016/j.neuroimage.2018.06.001. Epub 2018 Jun 2.
9
A refined approach for evaluating small datasets via binary classification using machine learning.一种通过机器学习对小数据集进行二进制分类的精细化方法。
PLoS One. 2024 May 21;19(5):e0301276. doi: 10.1371/journal.pone.0301276. eCollection 2024.
10
Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives.随机对照试验中的亚组分析:量化假阳性和假阴性风险
Health Technol Assess. 2001;5(33):1-56. doi: 10.3310/hta5330.

引用本文的文献

1
Screening hypertension using non-laboratory risk factors with machine learning: a retrospective cross-sectional study in Indonesia.利用非实验室风险因素和机器学习筛查高血压:印度尼西亚的一项回顾性横断面研究。
BMJ Open. 2025 Aug 27;15(8):e092364. doi: 10.1136/bmjopen-2024-092364.
2
Uncovering Hidden Profiles: From Pain-Centric to Multi-Symptom Small Fiber Neuropathy.揭示隐藏的特征:从以疼痛为中心到多症状小纤维神经病变
Eur J Neurol. 2025 Aug;32(8):e70321. doi: 10.1111/ene.70321.
3
Statistical variability in comparing accuracy of neuroimaging based classification models via cross validation.

本文引用的文献

1
Score and Correlation Coefficient-Based Feature Selection for Predicting Heart Failure Diagnosis by Using Machine Learning Algorithms.基于评分和相关系数的特征选择在使用机器学习算法预测心力衰竭诊断中的应用。
Comput Math Methods Med. 2021 Dec 20;2021:8500314. doi: 10.1155/2021/8500314. eCollection 2021.
2
Beating Naive Bayes at Taxonomic Classification of 16S rRNA Gene Sequences.在16S rRNA基因序列的分类学分类中击败朴素贝叶斯算法
Front Microbiol. 2021 Jun 18;12:644487. doi: 10.3389/fmicb.2021.644487. eCollection 2021.
3
A Study on Arrhythmia via ECG Signal Classification Using the Convolutional Neural Network.
通过交叉验证比较基于神经影像学的分类模型准确性时的统计变异性。
Sci Rep. 2025 Aug 6;15(1):28745. doi: 10.1038/s41598-025-12026-2.
4
Leveraging deep learning for the detection of socially desirable tendencies in personnel selection: A proof-of-concept.利用深度学习检测人员选拔中社会期望倾向:概念验证
PLoS One. 2025 Aug 5;20(8):e0329205. doi: 10.1371/journal.pone.0329205. eCollection 2025.
5
Comparative analysis of machine learning models for malaria detection using validated synthetic data: a cost-sensitive approach with clinical domain knowledge integration.使用经过验证的合成数据对疟疾检测机器学习模型进行比较分析:一种融合临床领域知识的成本敏感方法。
Sci Rep. 2025 Jul 25;15(1):27108. doi: 10.1038/s41598-025-10231-7.
6
Ethanol extract from Lour. leaves exhibit analgesic, antipyretic, and anti-inflammatory effects in mouse models.来自 Lour. 叶的乙醇提取物在小鼠模型中表现出镇痛、解热和抗炎作用。
BioTechnologia (Pozn). 2025 Jun 30;106(2):169-182. doi: 10.5114/bta/204527. eCollection 2025.
7
Anthropometric Measurements for Predicting Low Appendicular Lean Mass Index for the Diagnosis of Sarcopenia: A Machine Learning Model.用于预测低四肢瘦体重指数以诊断肌肉减少症的人体测量学指标:一种机器学习模型
J Funct Morphol Kinesiol. 2025 Jul 17;10(3):276. doi: 10.3390/jfmk10030276.
8
Predicting postprandial glucose excursions to personalize dietary interventions for type-2 diabetes management.预测餐后血糖波动以个性化定制2型糖尿病管理的饮食干预措施。
Sci Rep. 2025 Jul 17;15(1):25920. doi: 10.1038/s41598-025-08003-4.
9
When the crowd gets it wrong - the limits of collective wisdom in machine learning.当群体犯错时——机器学习中集体智慧的局限性
Sci Rep. 2025 Jul 1;15(1):22139. doi: 10.1038/s41598-025-08273-y.
10
Comprehensive examination of resting state fMRI connectomics yields new insights into brain function deficits in Gulf War illness after accounting for heterogeneity in brain impairment across the ill veteran population.在考虑到患病退伍军人人群中脑损伤的异质性后,对静息态功能磁共振成像连接组学进行全面检查,为海湾战争综合症中的脑功能缺陷带来了新的见解。
Neuroimage Rep. 2024 Jun 4;4(3):100209. doi: 10.1016/j.ynirp.2024.100209. eCollection 2024 Sep.
基于卷积神经网络的心电图信号分类对心律失常的研究
Front Comput Neurosci. 2021 Jan 5;14:564015. doi: 10.3389/fncom.2020.564015. eCollection 2020.
4
Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling.不平衡数据上过于乐观的预测结果:应用过采样时的缺陷与益处案例研究
Artif Intell Med. 2021 Jan;111:101987. doi: 10.1016/j.artmed.2020.101987. Epub 2020 Nov 20.
5
Machine learning algorithm validation with a limited sample size.机器学习算法在有限样本量下的验证。
PLoS One. 2019 Nov 7;14(11):e0224365. doi: 10.1371/journal.pone.0224365. eCollection 2019.
6
Post hoc power analysis: is it an informative and meaningful analysis?事后功效分析:它是一种信息丰富且有意义的分析吗?
Gen Psychiatr. 2019 Aug 8;32(4):e100069. doi: 10.1136/gpsych-2019-100069. eCollection 2019.
7
Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology.老年医学中的效应量指南、样本量计算与统计效能
Innov Aging. 2019 Sep 4;3(4):igz036. doi: 10.1093/geroni/igz036. eCollection 2019 Aug.
8
Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review.机器学习在医学影像学研究中的样本量确定方法:系统评价。
Can Assoc Radiol J. 2019 Nov;70(4):344-353. doi: 10.1016/j.carj.2019.06.002. Epub 2019 Sep 12.
9
The Meaningfulness of Effect Sizes in Psychological Research: Differences Between Sub-Disciplines and the Impact of Potential Biases.效应量在心理学研究中的意义:子学科之间的差异及潜在偏差的影响。
Front Psychol. 2019 Apr 11;10:813. doi: 10.3389/fpsyg.2019.00813. eCollection 2019.
10
Is there a large sample size problem?是否存在样本量过大的问题?
Ophthalmic Physiol Opt. 2019 May;39(3):129-130. doi: 10.1111/opo.12618.