Bernstein Joseph, McGuire Kevin, Freedman Kevin B
Department of Orthopaedic Surgery, 424 Stemmler Hall, University of Pennsylvania, Philadelphia, PA 19104-6081, USA.
Clin Orthop Relat Res. 2003 Aug(413):55-62. doi: 10.1097/01.blo.0000079769.06654.8c.
The purpose of the current article was to review the process of hypothesis testing and statistical sampling and empower readers to critically appraise the literature. When the p value of a study lies above the alpha threshold, the results are said to be not statistically significant. It is possible, however, that real differences do exist, but the study was insufficiently powerful to detect them. In that case, the conclusion that two groups are equivalent is wrong. The probability of this mistake, the Type II error, is given by the beta statistic. The complement of beta, or 1-beta, representing the chance of avoiding a Type II error, is termed the statistical power of the study. We previously examined the statistical power and sample size in all of the studies published in 1997 in the American and British volumes of the Journal of Bone and Joint Surgery, and in Clinical Orthopaedics and Related Research. In the journals examined, only 3% of studies had adequate statistical power to detect a small effect size in this sample. In addition, a study examining only randomized control trials in these journals showed that none of 25 randomized control trials had adequate statistical power to detect a small effect size. However, beta, or power, is less well understood. Because of this, researchers and readers should be aware of the need to address issues of statistical power before a study begins and be cautious of studies that conclude that no difference exists between groups.
本文的目的是回顾假设检验和统计抽样的过程,并使读者有能力批判性地评估文献。当一项研究的p值高于α阈值时,其结果被认为无统计学意义。然而,有可能实际差异确实存在,但该研究的效力不足以检测到它们。在这种情况下,得出两组等效的结论是错误的。这种错误(II类错误)的概率由β统计量给出。β的补数,即1-β,代表避免II类错误的机会,被称为该研究的统计效力。我们之前检查了1997年发表在美国和英国版《骨与关节外科杂志》以及《临床骨科与相关研究》上的所有研究的统计效力和样本量。在所检查的期刊中,只有3%的研究有足够的统计效力来检测该样本中的小效应量。此外,一项仅检查这些期刊中随机对照试验的研究表明,25项随机对照试验中没有一项有足够的统计效力来检测小效应量。然而,β或效力的理解程度较低。因此,研究人员和读者应该意识到在研究开始前解决统计效力问题的必要性,并对得出两组之间无差异结论的研究持谨慎态度。