P 值只是证据的指标：20 世纪与 21 世纪的统计科学。

P values are only an index to evidence: 20th- vs. 21st-century statistical science.

出版信息

Ecology. 2014 Mar;95(3):627-30. doi: 10.1890/13-1066.1.

Abstract

Early statistical methods focused on pre-data probability statements (i.e., data as random variables) such as P values; these are not really inferences nor are P values evidential. Statistical science clung to these principles throughout much of the 20th century as a wide variety of methods were developed for special cases. Looking back, it is clear that the underlying paradigm (i.e., testing and P values) was weak. As Kuhn (1970) suggests, new paradigms have taken the place of earlier ones: this is a goal of good science. New methods have been developed and older methods extended and these allow proper measures of strength of evidence and multimodel inference. It is time to move forward with sound theory and practice for the difficult practical problems that lie ahead. Given data the useful foundation shifts to post-data probability statements such as model probabilities (Akaike weights) or related quantities such as odds ratios and likelihood intervals. These new methods allow formal inference from multiple models in the a prior set. These quantities are properly evidential. The past century was aimed at finding the "best" model and making inferences from it. The goal in the 21st century is to base inference on all the models weighted by their model probabilities (model averaging). Estimates of precision can include model selection uncertainty leading to variances conditional on the model set. The 21st century will be about the quantification of information, proper measures of evidence, and multi-model inference. Nelder (1999:261) concludes, "The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science and technology".

摘要

早期的统计方法侧重于数据之前的概率陈述（即，数据作为随机变量），例如 P 值；这些都不是真正的推论，也不是证据的 P 值。统计科学在整个 20 世纪的大部分时间里都坚持这些原则，因为针对特殊情况开发了各种各样的方法。回顾过去，很明显，基础范式（即测试和 P 值）是薄弱的。正如库恩（Kuhn）（1970）所指出的，新的范式已经取代了早期的范式：这是科学的目标。已经开发了新的方法并扩展了旧的方法，这些方法允许对证据的强度和多模型推论进行适当的度量。现在是时候根据未来的困难实际问题向前推进合理的理论和实践了。给定数据，有用的基础就转变为后数据概率陈述，例如模型概率（Akaike 权重）或相关数量，例如赔率和似然区间。这些新方法允许从前置模型集中对多个模型进行正式推断。这些数量是适当的证据。上个世纪的目标是找到“最佳”模型并从中进行推断。21 世纪的目标是根据模型概率对所有模型进行加权（模型平均）来进行推断。精度估计可以包括基于模型集的模型选择不确定性导致的方差。21 世纪将是关于信息的量化、适当的证据衡量和多模型推论。Nelder（1999：261）总结说：“在发展统计科学方面，我们面临的最重要任务是摧毁 P 值文化，这种文化已经在纯科学和应用科学技术的许多领域中扎根，达到了令人恐惧的程度”。