Suppr超能文献

研究的可重复性与P值的错误解读

The reproducibility of research and the misinterpretation of -values.

作者信息

Colquhoun David

机构信息

Department of Neuroscience, Physiology and Pharmacology, University College London, London, UK.

出版信息

R Soc Open Sci. 2017 Dec 6;4(12):171085. doi: 10.1098/rsos.171085. eCollection 2017 Dec.

Abstract

We wish to answer this question: If you observe a 'significant' -value after doing a single unbiased experiment, what is the probability that your result is a false positive? The weak evidence provided by values between 0.01 and 0.05 is explored by exact calculations of false positive risks. When you observe  = 0.05, the odds in favour of there being a real effect (given by the likelihood ratio) are about 3 : 1. This is far weaker evidence than the odds of 19 to 1 that might, wrongly, be inferred from the value. And if you want to limit the false positive risk to 5%, you would have to assume that you were 87% sure that there was a real effect before the experiment was done. If you observe   0.001 in a well-powered experiment, it gives a likelihood ratio of almost 100 : 1 odds on there being a real effect. That would usually be regarded as conclusive. But the false positive risk would still be 8% if the prior probability of a real effect were only 0.1. And, in this case, if you wanted to achieve a false positive risk of 5% you would need to observe  = 0.00045. It is recommended that the terms 'significant' and 'non-significant' should never be used. Rather, values should be supplemented by specifying the prior probability that would be needed to produce a specified (e.g. 5%) false positive risk. It may also be helpful to specify the minimum false positive risk associated with the observed value. Despite decades of warnings, many areas of science still insist on labelling a result of  < 0.05 as 'statistically significant'. This practice must contribute to the lack of reproducibility in some areas of science. This is before you get to the many other well-known problems, like multiple comparisons, lack of randomization and hacking. Precise inductive inference is impossible and replication is the only way to be sure. Science is endangered by statistical misunderstanding, and by senior people who impose perverse incentives on scientists.

摘要

我们希望回答这个问题

在进行一次无偏实验后,如果你观察到一个“显著的”p值,那么你的结果为假阳性的概率是多少?通过对假阳性风险的精确计算,探究了介于0.01和0.05之间的p值所提供的微弱证据。当你观察到p = 0.05时,支持存在真实效应的概率(由似然比给出)约为3∶1。这远比从该p值可能错误推断出的19∶1的概率要弱得多。而且如果你想将假阳性风险限制在5%,那么在实验进行之前,你必须假定自己有87%的把握确定存在真实效应。在一个功效强大的实验中,如果观察到p < 0.001,它给出存在真实效应的似然比几乎为100∶1。这通常会被视为结论性的。但是,如果真实效应的先验概率仅为0.1,那么假阳性风险仍将为8%。在这种情况下,如果你想实现5%的假阳性风险,你需要观察到p = 0.00045。建议永远不要使用“显著的”和“不显著的”这些术语。相反,应该通过指定产生特定(例如5%)假阳性风险所需的先验概率来补充p值。指定与观察到的p值相关的最小假阳性风险可能也会有所帮助。尽管有几十年的警告,但许多科学领域仍然坚持将p < 0.05的结果标记为“统计显著”。这种做法必定导致了某些科学领域缺乏可重复性。这还是在你遇到许多其他众所周知的问题之前,比如多重比较、缺乏随机化和数据篡改。精确的归纳推理是不可能的,重复实验是确定结果的唯一方法。科学正受到统计误解以及对科学家施加不良激励的资深人士的威胁。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33c5/5750014/2498cecce55c/rsos171085-g1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验