向不可能致敬：p 值、证据和似然。

Hail the impossible: p-values, evidence, and likelihood.

机构信息

Kristianstad University, Kristianstad, Sweden.

出版信息

Scand J Psychol. 2011 Apr;52(2):113-25. doi: 10.1111/j.1467-9450.2010.00852.x. Epub 2010 Nov 16.

DOI:10.1111/j.1467-9450.2010.00852.x

Abstract

Significance testing based on p-values is standard in psychological research and teaching. Typically, research articles and textbooks present and use p as a measure of statistical evidence against the null hypothesis (the Fisherian interpretation), although using concepts and tools based on a completely different usage of p as a tool for controlling long-term decision errors (the Neyman-Pearson interpretation). There are four major problems with using p as a measure of evidence and these problems are often overlooked in the domain of psychology. First, p is uniformly distributed under the null hypothesis and can therefore never indicate evidence for the null. Second, p is conditioned solely on the null hypothesis and is therefore unsuited to quantify evidence, because evidence is always relative in the sense of being evidence for or against a hypothesis relative to another hypothesis. Third, p designates probability of obtaining evidence (given the null), rather than strength of evidence. Fourth, p depends on unobserved data and subjective intentions and therefore implies, given the evidential interpretation, that the evidential strength of observed data depends on things that did not happen and subjective intentions. In sum, using p in the Fisherian sense as a measure of statistical evidence is deeply problematic, both statistically and conceptually, while the Neyman-Pearson interpretation is not about evidence at all. In contrast, the likelihood ratio escapes the above problems and is recommended as a tool for psychologists to represent the statistical evidence conveyed by obtained data relative to two hypotheses.

摘要

基于 p 值的显著性检验在心理学研究和教学中是标准的。通常，研究论文和教材以 Fisher 解释（即 p 值作为反对零假设的统计证据的度量）呈现和使用 p 值，尽管使用基于 p 值的完全不同用法的概念和工具来控制长期决策错误（Neyman-Pearson 解释）。在心理学领域，使用 p 值作为证据的度量存在四个主要问题，这些问题通常被忽视。首先，p 值在零假设下是均匀分布的，因此永远不能表示对零假设的证据。其次，p 值仅取决于零假设，因此不适合量化证据，因为证据总是相对的，即相对于另一个假设，证据是支持或反对假设的证据。第三，p 值指定获得证据的概率（给定零假设），而不是证据的强度。第四，p 值取决于未观察到的数据和主观意图，因此，根据证据解释，观察到的数据的证据强度取决于未发生的事情和主观意图。总之，在 Fisher 意义上，将 p 值作为统计证据的度量在统计学和概念上都存在严重问题，而 Neyman-Pearson 解释根本不是关于证据的。相比之下，似然比避免了上述问题，被推荐为心理学家用来表示相对于两个假设获得的数据所传达的统计证据的工具。