统计学显著性的预测能力。

Predictive power of statistical significance.

作者信息

Heston Thomas F, King Jackson M

机构信息

Department of Family Medicine, University of Washington, Seattle, WA 98195-6340, United States.

Department of Medical Education and Clinical Sciences, Elson S. Floyd College of Medicine, Washington State University, Spokane, WA 99210-1495, United States.

出版信息

World J Methodol. 2017 Dec 26;7(4):112-116. doi: 10.5662/wjm.v7.i4.112.

DOI:10.5662/wjm.v7.i4.112

PMID:29354483

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5746664/

Abstract

A statistically significant research finding should not be defined as a -value of 0.05 or less, because this definition does not take into account study power. Statistical significance was originally defined by Fisher RA as a -value of 0.05 or less. According to Fisher, any finding that is likely to occur by random variation no more than 1 in 20 times is considered significant. Neyman J and Pearson ES subsequently argued that Fisher's definition was incomplete. They proposed that statistical significance could only be determined by analyzing the chance of incorrectly considering a study finding was significant (a Type I error) or incorrectly considering a study finding was insignificant (a Type II error). Their definition of statistical significance is also incomplete because the error rates are considered separately, not together. A better definition of statistical significance is the positive predictive value of a -value, which is equal to the power divided by the sum of power and the -value. This definition is more complete and relevant than Fisher's or Neyman-Peason's definitions, because it takes into account both concepts of statistical significance. Using this definition, a statistically significant finding requires a -value of 0.05 or less when the power is at least 95%, and a -value of 0.032 or less when the power is 60%. To achieve statistical significance, -values must be adjusted downward as the study power decreases.

摘要

具有统计学意义的研究结果不应被定义为P值等于或小于0.05，因为这个定义没有考虑检验效能。统计学意义最初由费希尔（R.A. Fisher）定义为P值等于或小于0.05。按照费希尔的说法，任何由随机变异导致的结果，其发生概率不超过二十分之一的，都被认为是显著的。内曼（J. Neyman）和皮尔逊（E.S. Pearson）随后指出费希尔的定义并不完整。他们提出，统计学意义只能通过分析错误地认为研究结果具有显著性（I型错误）或错误地认为研究结果不具有显著性（II型错误）的概率来确定。他们对统计学意义的定义也不完整，因为错误率是分开考虑的，而不是综合起来考虑。对统计学意义更好的定义是P值的阳性预测值，它等于检验效能除以检验效能与P值之和。这个定义比费希尔或内曼 - 皮尔逊的定义更完整、更相关，因为它考虑了统计学意义的两个概念。使用这个定义，当检验效能至少为95%时，具有统计学意义的结果要求P值等于或小于0.05；当检验效能为60%时，P值等于或小于0.032。为了达到统计学意义，随着研究检验效能的降低，P值必须向下调整。