Goodman S N
Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.
Ann Intern Med. 1999 Jun 15;130(12):995-1004. doi: 10.7326/0003-4819-130-12-199906150-00008.
An important problem exists in the interpretation of modern medical research data: Biological understanding and previous research play little formal role in the interpretation of quantitative results. This phenomenon is manifest in the discussion sections of research articles and ultimately can affect the reliability of conclusions. The standard statistical approach has created this situation by promoting the illusion that conclusions can be produced with certain "error rates," without consideration of information from outside the experiment. This statistical approach, the key components of which are P values and hypothesis tests, is widely perceived as a mathematically coherent approach to inference. There is little appreciation in the medical community that the methodology is an amalgam of incompatible elements, whose utility for scientific inference has been the subject of intense debate among statisticians for almost 70 years. This article introduces some of the key elements of that debate and traces the appeal and adverse impact of this methodology to the P value fallacy, the mistaken idea that a single number can capture both the long-run outcomes of an experiment and the evidential meaning of a single result. This argument is made as a prelude to the suggestion that another measure of evidence should be used--the Bayes factor, which properly separates issues of long-run behavior from evidential strength and allows the integration of background knowledge with statistical findings.
生物学理解和先前的研究在定量结果的解读中几乎没有发挥正式作用。这种现象在研究文章的讨论部分很明显,最终可能会影响结论的可靠性。标准的统计方法造成了这种情况,它制造了一种错觉,即可以以特定的“错误率”得出结论,而不考虑实验之外的信息。这种统计方法,其关键组成部分是P值和假设检验,被广泛认为是一种数学上连贯的推理方法。医学界几乎没有意识到,这种方法是不相容元素的混合体,其在科学推理中的效用在统计学家中已经激烈争论了近70年。本文介绍了这场争论的一些关键要素,并追溯了这种方法的吸引力和不利影响,这源于P值谬误,即错误地认为一个单一数字既能捕捉实验的长期结果,又能体现单个结果的证据意义。提出这一观点是为了建议使用另一种证据度量——贝叶斯因子,它能正确地将长期行为问题与证据强度区分开来,并允许将背景知识与统计结果相结合。