Institute of Cognitive Neuroscience, National Central University, Zhongda Rd, No. 300, Zhongli District, Taoyuan City, 320317, Taiwan, ROC.
Taiwan International Graduate Program in Interdisciplinary Neuroscience, National Central University and Academia Sinica, Taipei, Taiwan, ROC.
BMC Bioinformatics. 2023 Feb 14;24(1):48. doi: 10.1186/s12859-023-05156-9.
An appropriate sample size is essential for obtaining a precise and reliable outcome of a study. In machine learning (ML), studies with inadequate samples suffer from overfitting of data and have a lower probability of producing true effects, while the increment in sample size increases the accuracy of prediction but may not cause a significant change after a certain sample size. Existing statistical approaches using standardized mean difference, effect size, and statistical power for determining sample size are potentially biased due to miscalculations or lack of experimental details. This study aims to design criteria for evaluating sample size in ML studies. We examined the average and grand effect sizes and the performance of five ML methods using simulated datasets and three real datasets to derive the criteria for sample size. We systematically increase the sample size, starting from 16, by randomly sampling and examine the impact of sample size on classifiers' performance and both effect sizes. Tenfold cross-validation was used to quantify the accuracy.
The results demonstrate that the effect sizes and the classification accuracies increase while the variances in effect sizes shrink with the increment of samples when the datasets have a good discriminative power between two classes. By contrast, indeterminate datasets had poor effect sizes and classification accuracies, which did not improve by increasing sample size in both simulated and real datasets. A good dataset exhibited a significant difference in average and grand effect sizes. We derived two criteria based on the above findings to assess a decided sample size by combining the effect size and the ML accuracy. The sample size is considered suitable when it has appropriate effect sizes (≥ 0.5) and ML accuracy (≥ 80%). After an appropriate sample size, the increment in samples will not benefit as it will not significantly change the effect size and accuracy, thereby resulting in a good cost-benefit ratio.
We believe that these practical criteria can be used as a reference for both the authors and editors to evaluate whether the selected sample size is adequate for a study.
适当的样本量对于获得研究结果的精确性和可靠性至关重要。在机器学习(ML)中,样本量不足的研究容易出现数据过拟合,产生真实效果的概率较低,而增加样本量则会提高预测的准确性,但在一定样本量之后可能不会产生显著变化。现有的使用标准化平均差、效果大小和统计功效来确定样本量的统计方法可能会因为计算错误或缺乏实验细节而存在偏差。本研究旨在设计用于评估 ML 研究中样本量的标准。我们使用模拟数据集和三个真实数据集来检验平均效果大小和总体效果大小以及五种 ML 方法的性能,从而得出样本量标准。我们系统地从 16 开始随机增加样本量,研究样本量对分类器性能和两种效果大小的影响。十折交叉验证用于量化准确性。
结果表明,当数据集在两类之间具有良好的区分能力时,随着样本量的增加,效果大小和分类准确率会增加,而效果大小的方差会缩小。相比之下,不确定数据集的效果大小和分类准确率较差,在模拟和真实数据集两种情况下,增加样本量都无法提高效果大小和分类准确率。良好的数据集表现出显著的平均效果大小和总体效果大小差异。我们根据上述发现,结合效果大小和 ML 准确性,得出了两个评估决定样本量的标准。当样本量具有适当的效果大小(≥0.5)和 ML 准确性(≥80%)时,认为样本量是合适的。在适当的样本量之后,增加样本量不会带来好处,因为它不会显著改变效果大小和准确性,从而实现良好的成本效益比。
我们相信这些实用标准可以为作者和编辑提供参考,以评估所选样本量是否适合研究。