置信区间有助于（但不能保证）比统计显著性检验进行更好的推断。

Confidence intervals permit, but do not guarantee, better inference than statistical significance testing.

作者信息

Coulson Melissa, Healey Michelle, Fidler Fiona, Cumming Geoff

机构信息

Statistical Cognition Laboratory, School of Psychological Science, La Trobe University Melbourne, VIC, Australia.

出版信息

Front Psychol. 2010 Jul 2;1:26. doi: 10.3389/fpsyg.2010.00026. eCollection 2010.

A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.

一个具有统计学显著性的结果和一个无显著性的结果可能差异不大，尽管显著性状态可能会引发对差异的解读。本文报告了两项研究，它们比较了使用零假设显著性检验（NHST）或置信区间（CI）呈现的此类结果的解读情况。通过电子邮件，向心理学、行为神经科学和医学期刊上发表文章的作者们提出要求，请他们解读两项虚拟研究，这两项研究得出了相似的结果，一项具有统计学显著性，另一项无显著性。330位作者的回复差异很大，但无论结果是以置信区间还是使用零假设显著性检验呈现，解读通常都很糟糕。然而，在解读置信区间时，提及零假设显著性检验的受访者中有60%可能会不合理地得出这两个结果相互矛盾的结论，而那些在解读置信区间时未提及零假设显著性检验的受访者中有95%可能会合理地得出这两个结果一致的结论。所有三个学科的研究结果总体上相似。对学术心理学家的一项电子邮件调查证实，如果不引入零假设显著性检验，置信区间能引发更好的解读。鼓励进行元分析思维和使用置信区间可以带来更好的统计推断，但要获得充分的益处，这种非常理想的统计改革还要求研究人员在解读置信区间时不依赖零假设显著性检验。