Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, Australia.
Department of Biological Sciences, Bishop's University, Sherbrooke, Canada.
PLoS Biol. 2019 Jan 25;17(1):e3000127. doi: 10.1371/journal.pbio.3000127. eCollection 2019 Jan.
There is increased concern about poor scientific practices arising from an excessive focus on P-values. Two particularly worrisome practices are selective reporting of significant results and 'P-hacking'. The latter is the manipulation of data collection, usage, or analyses to obtain statistically significant outcomes. Here, we introduce the novel, to our knowledge, concepts of selective reporting of nonsignificant results and 'reverse P-hacking' whereby researchers ensure that tests produce a nonsignificant result. We test whether these practices occur in experiments in which researchers randomly assign subjects to treatment and control groups to minimise differences in confounding variables that might affect the focal outcome. By chance alone, 5% of tests for a group difference in confounding variables should yield a significant result (P < 0.05). If researchers less often report significant findings and/or reverse P-hack to avoid significant outcomes that undermine the ethos that experimental and control groups only differ with respect to actively manipulated variables, we expect significant results from tests for group differences to be under-represented in the literature. We surveyed the behavioural ecology literature and found significantly more nonsignificant P-values reported for tests of group differences in potentially confounding variables than the expected 95% (P = 0.005; N = 250 studies). This novel, to our knowledge, publication bias could result from selective reporting of nonsignificant results and/or from reverse P-hacking. We encourage others to test for a bias toward publishing nonsignificant results in the equivalent context in their own research discipline.
人们越来越关注过度关注 P 值所带来的不良科学实践。两种特别令人担忧的做法是有选择地报告显著结果和“P 操纵”。后者是指操纵数据收集、使用或分析以获得统计学上显著的结果。在这里,我们引入了选择性报告无显著结果和“反向 P 操纵”的新概念,研究人员通过这些概念来确保测试产生无显著结果。我们测试了这些做法是否会出现在研究人员随机将受试者分配到处理组和对照组以最小化可能影响焦点结果的混杂变量差异的实验中。仅凭机会,5%的混杂变量组间差异测试应该会产生显著结果(P < 0.05)。如果研究人员较少报告显著发现,并且/或者为了避免显著结果破坏实验组和对照组仅在主动操纵变量方面存在差异的精神,我们预计组间差异测试的显著结果在文献中会被低估。我们调查了行为生态学文献,发现报告的潜在混杂变量组间差异测试的无显著 P 值明显多于预期的 95%(P = 0.005;N = 250 项研究)。这种新颖的、据我们所知的发表偏倚可能是由于有选择性地报告无显著结果和/或反向 P 操纵所致。我们鼓励其他人在自己的研究领域中测试在同等背景下发表无显著结果的偏向。