From the Department of Anesthesiology, VU University Medical Center, Amsterdam, the Netherlands.
Anesth Analg. 2018 Mar;126(3):1068-1072. doi: 10.1213/ANE.0000000000002798.
Effect size measures are used to quantify treatment effects or associations between variables. Such measures, of which >70 have been described in the literature, include unstandardized and standardized differences in means, risk differences, risk ratios, odds ratios, or correlations. While null hypothesis significance testing is the predominant approach to statistical inference on effect sizes, results of such tests are often misinterpreted, provide no information on the magnitude of the estimate, and tell us nothing about the clinically importance of an effect. Hence, researchers should not merely focus on statistical significance but should also report the observed effect size. However, all samples are to some degree affected by randomness, such that there is a certain uncertainty on how well the observed effect size represents the actual magnitude and direction of the effect in the population. Therefore, point estimates of effect sizes should be accompanied by the entire range of plausible values to quantify this uncertainty. This facilitates assessment of how large or small the observed effect could actually be in the population of interest, and hence how clinically important it could be. This tutorial reviews different effect size measures and describes how confidence intervals can be used to address not only the statistical significance but also the clinical significance of the observed effect or association. Moreover, we discuss what P values actually represent, and how they provide supplemental information about the significant versus nonsignificant dichotomy. This tutorial intentionally focuses on an intuitive explanation of concepts and interpretation of results, rather than on the underlying mathematical theory or concepts.
效应量测度用于量化处理效果或变量之间的关联。在文献中已经描述了超过 70 种这样的测度方法,包括均值的未标准化和标准化差异、风险差异、风险比、优势比或相关系数。虽然假设检验是对效应量进行统计推断的主要方法,但这些检验的结果经常被误解,无法提供估计值的大小信息,也无法说明效应的临床重要性。因此,研究人员不仅应该关注统计显著性,还应该报告观察到的效应量。然而,所有样本在某种程度上都受到随机性的影响,因此对于观察到的效应量在总体中实际的大小和方向的代表性存在一定的不确定性。因此,效应量的点估计值应该伴随着可能的所有数值范围,以量化这种不确定性。这有助于评估观察到的效应在感兴趣的总体中实际上可能有多大或多小,以及它在临床上可能有多重要。本教程回顾了不同的效应量测度,并描述了置信区间如何不仅可以用于解决观察到的效应或关联的统计显著性,还可以用于解决其临床显著性。此外,我们还讨论了 P 值实际上代表什么,以及它们如何提供关于显著与非显著二分法的补充信息。本教程有意侧重于对概念的直观解释和对结果的解释,而不是对基础数学理论或概念的解释。