Suppr
超能文献

回归分析的最小样本量解决方案。

A solution to minimum sample size for regressions.

机构信息

Department of Biology, University of Central Florida, Orlando, Florida, United States of America.

出版信息

PLoS One. 2020 Feb 21;15(2):e0229345. doi: 10.1371/journal.pone.0229345. eCollection 2020.

DOI:10.1371/journal.pone.0229345

PMID:32084211

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7034864/

Abstract

Regressions and meta-regressions are widely used to estimate patterns and effect sizes in various disciplines. However, many biological and medical analyses use relatively low sample size (N), contributing to concerns on reproducibility. What is the minimum N to identify the most plausible data pattern using regressions? Statistical power analysis is often used to answer that question, but it has its own problems and logically should follow model selection to first identify the most plausible model. Here we make null, simple linear and quadratic data with different variances and effect sizes. We then sample and use information theoretic model selection to evaluate minimum N for regression models. We also evaluate the use of coefficient of determination (R2) for this purpose; it is widely used but not recommended. With very low variance, both false positives and false negatives occurred at N < 8, but data shape was always clearly identified at N ≥ 8. With high variance, accurate inference was stable at N ≥ 25. Those outcomes were consistent at different effect sizes. Akaike Information Criterion weights (AICc wi) were essential to clearly identify patterns (e.g., simple linear vs. null); R2 or adjusted R2 values were not useful. We conclude that a minimum N = 8 is informative given very little variance, but minimum N ≥ 25 is required for more variance. Alternative models are better compared using information theory indices such as AIC but not R2 or adjusted R2. Insufficient N and R2-based model selection apparently contribute to confusion and low reproducibility in various disciplines. To avoid those problems, we recommend that research based on regressions or meta-regressions use N ≥ 25.

摘要

回归和元回归广泛用于估计各个学科的模式和效应大小。然而，许多生物和医学分析使用相对较低的样本量（N），这引起了人们对可重复性的关注。使用回归来确定最合理的数据模式的最小 N 是多少？统计功效分析通常用于回答这个问题，但它也有自己的问题，并且逻辑上应该遵循模型选择，首先确定最合理的模型。在这里，我们生成具有不同方差和效应大小的零假设、简单线性和二次数据。然后，我们对数据进行采样，并使用信息论模型选择来评估回归模型的最小 N。我们还评估了决定系数（R2）在此目的中的使用；它被广泛使用，但不推荐使用。在方差非常低的情况下，N < 8 时会出现假阳性和假阴性，但在 N ≥ 8 时，数据形状始终可以清晰识别。在方差较高的情况下，在 N ≥ 25 时准确的推断是稳定的。在不同的效应大小下，这些结果都是一致的。Akaike 信息准则权重（AICc wi）对于清晰识别模式（例如，简单线性与零假设）至关重要；R2 或调整后的 R2 值没有用处。我们的结论是，在方差非常小的情况下，N = 8 是有信息的，但在方差较大的情况下，需要 N ≥ 25。使用信息论指数（如 AIC）而不是 R2 或调整后的 R2 来比较替代模型更好。在各个学科中，N 不足和基于 R2 的模型选择显然导致了混淆和低可重复性。为了避免这些问题，我们建议基于回归或元回归的研究使用 N ≥ 25。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9af/7034864/fe591f5dd54c/pone.0229345.g001.jpg

相似文献

A solution to minimum sample size for regressions.

PLoS One. 2020 Feb 21;15(2):e0229345. doi: 10.1371/journal.pone.0229345. eCollection 2020.

Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives.

Health Technol Assess. 2001;5(33):1-56. doi: 10.3310/hta5330.

Response to letter to the editor from Dr Rahman Shiri: The challenging topic of suicide across occupational groups.

Scand J Work Environ Health. 2018 Jan 1;44(1):108-110. doi: 10.5271/sjweh.3698. Epub 2017 Dec 8.

Effect size, statistical power, and sample size for assessing interactions between categorical and continuous variables.

Br J Math Stat Psychol. 2019 Feb;72(1):136-154. doi: 10.1111/bmsp.12147. Epub 2018 Nov 23.

How often should we expect to be wrong? Statistical power, P values, and the expected prevalence of false discoveries.

Biochem Pharmacol. 2018 May;151:226-233. doi: 10.1016/j.bcp.2017.12.011. Epub 2017 Dec 14.

Retrospective analysis: reproducibility of interblastomere differences of mRNA expression in 2-cell stage mouse embryos is remarkably poor due to combinatorial mechanisms of blastomere diversification.

Mol Hum Reprod. 2018 Jul 1;24(7):388-400. doi: 10.1093/molehr/gay021.

(I Can't Get No) Saturation: A simulation and guidelines for sample sizes in qualitative research.

PLoS One. 2017 Jul 26;12(7):e0181689. doi: 10.1371/journal.pone.0181689. eCollection 2017.

Sample size calculations in human electrophysiology (EEG and ERP) studies: A systematic review and recommendations for increased rigor.

Int J Psychophysiol. 2017 Jan;111:33-41. doi: 10.1016/j.ijpsycho.2016.06.015. Epub 2016 Jun 29.

Controlling test size while gaining the benefits of an internal pilot design.

Biometrics. 2001 Jun;57(2):625-31. doi: 10.1111/j.0006-341x.2001.00625.x.

Model selection for extended quasi-likelihood models in small samples.

Biometrics. 1995 Sep;51(3):1077-84.

引用本文的文献

Impact of forward and backward walking on gait parameters across parkinson's disease stages and severity: a prospective observational study.

BMC Neurol. 2025 Sep 9;25(1):379. doi: 10.1186/s12883-025-04321-2.

Harnessing multi-output machine learning approach and dynamical observables from network structure to optimize COVID-19 intervention strategies.

Biol Methods Protoc. 2025 Jun 5;10(1):bpaf039. doi: 10.1093/biomethods/bpaf039. eCollection 2025.

Social Support Domains Associated with Social Isolation in MCI.

Act Adapt Aging. 2024 Oct 4. doi: 10.1080/01924788.2024.2411776.

Quantum Descriptor-Based Machine-Learning Modeling of Thermal Hazard of Cyclic Sulfamidates.

J Chem Inf Model. 2025 Aug 25;65(16):8624-8636. doi: 10.1021/acs.jcim.5c01048. Epub 2025 Aug 15.

Evaluation of the Influence of Intervention Tools Used in Nutrition Education Programs: A Mixed Approach.

Nutrients. 2025 Jul 28;17(15):2460. doi: 10.3390/nu17152460.

Developing a Behavioral Phenotyping Layer for Artificial Intelligence-Driven Predictive Analytics in a Digital Resiliency Course: Protocol for a Randomized Controlled Trial.

JMIR Res Protoc. 2025 Aug 6;14:e73773. doi: 10.2196/73773.

Methamphetamine-induced adaptation of learning rate dynamics depend on baseline performance.

Elife. 2025 Jul 21;13:RP101413. doi: 10.7554/eLife.101413.

Self-esteem among People Living with Physical Disability Visting Rehabilitation Centers of Kathmandu, Nepal: An Observational Study.

JNMA J Nepal Med Assoc. 2024 Nov;62(279):750-756. doi: 10.31729/jnma.8801. Epub 2024 Nov 30.

Feasibility trial of an unguided ultra-brief online psychological intervention within an online mental health clinic: The "things you do" intervention.

Internet Interv. 2025 Jun 26;41:100852. doi: 10.1016/j.invent.2025.100852. eCollection 2025 Sep.

Is pelvic floor loading in female runners associated with post-run changes in pelvic floor morphometry or function?

BJU Int. 2025 Jun 30. doi: 10.1111/bju.16842.

本文引用的文献

Null Hypothesis Testing ≠ Scientific Inference: A Critique of the Shaky Premise at the Heart of the Science and Values Debate, and a Defense of Value-Neutral Risk Assessment.

Risk Anal. 2019 Jul;39(7):1520-1532. doi: 10.1111/risa.13284. Epub 2019 Feb 11.

Meta-analysis and the science of research synthesis.

Nature. 2018 Mar 7;555(7695):175-182. doi: 10.1038/nature25753.

What does research reproducibility mean?

Sci Transl Med. 2016 Jun 1;8(341):341ps12. doi: 10.1126/scitranslmed.aaf5027.

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.

Eur J Epidemiol. 2016 Apr;31(4):337-50. doi: 10.1007/s10654-016-0149-3. Epub 2016 May 21.

How Many Subjects Does It Take To Do A Regression Analysis.

Multivariate Behav Res. 1991 Jul 1;26(3):499-510. doi: 10.1207/s15327906mbr2603_7.

Reproducible Research Practices and Transparency across the Biomedical Literature.

PLoS Biol. 2016 Jan 4;14(1):e1002333. doi: 10.1371/journal.pbio.1002333. eCollection 2016 Jan.

PSYCHOLOGY. Estimating the reproducibility of psychological science.

Science. 2015 Aug 28;349(6251):aac4716. doi: 10.1126/science.aac4716.

Experimental design and analysis and their reporting: new guidance for publication in BJP.

Br J Pharmacol. 2015 Jul;172(14):3461-71. doi: 10.1111/bph.12856.

A basic introduction to fixed-effect and random-effects models for meta-analysis.

Res Synth Methods. 2010 Apr;1(2):97-111. doi: 10.1002/jrsm.12. Epub 2010 Nov 21.

The extent and consequences of p-hacking in science.

PLoS Biol. 2015 Mar 13;13(3):e1002106. doi: 10.1371/journal.pbio.1002106. eCollection 2015 Mar.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

回归分析的最小样本量解决方案。

A solution to minimum sample size for regressions.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译