Berry E M, Coustère-Yakir C, Grover N B
Department of Human Nutrition & Metabolism, Hebrew University, Jerusalem, Israel.
QJM. 1998 Sep;91(9):647-53. doi: 10.1093/qjmed/91.9.647.
We discuss the implications of empirical results that are statistically non-significant. Figures illustrate the interrelations among effect size, sample sizes and their dispersion, and the power of the experiment. All calculations (detailed in Appendix) are based on actual noncentral t-distributions, with no simplifying mathematical or statistical assumptions, and the contribution of each tail is determined separately. We emphasize the importance of reporting, wherever possible, the a priori power of a study so that the reader can see what the chances were of rejecting a null hypothesis that was false. As a practical alternative, we propose that non-significant inference be qualified by an estimate of the sample size that would be required in a subsequent experiment in order to attain an acceptable level of power under the assumption that the observed effect size in the sample is the same as the true effect size in the population; appropriate plots are provided for a power of 0.8. We also point out that successive outcomes of independent experiments each of which may not be statistically significant on its own, can be easily combined to give an overall p value that often turns out to be significant. And finally, in the event that the p value is high and the power sufficient, a non-significant result may stand and be published as such.
我们讨论了统计上不显著的实证结果的影响。图表说明了效应大小、样本量及其离散程度以及实验效能之间的相互关系。所有计算(详见附录)均基于实际的非中心t分布,没有简化的数学或统计假设,并且分别确定每个尾部的贡献。我们强调,只要有可能,报告研究的先验效能非常重要,以便读者能够了解拒绝错误原假设的可能性有多大。作为一种实际的替代方法,我们建议通过估计后续实验所需的样本量来对不显著的推断进行限定,以便在假设样本中观察到的效应大小与总体中的真实效应大小相同的情况下,达到可接受的效能水平;提供了效能为0.8时的相应图表。我们还指出,独立实验的连续结果(每个结果本身可能在统计上不显著)可以很容易地合并,以得出一个总体p值,该p值往往会变得显著。最后,如果p值较高且效能足够,一个不显著的结果可能会成立并照此发表。