Hartgerink Chris H J, van Aert Robbie C M, Nuijten Michèle B, Wicherts Jelte M, van Assen Marcel A L M
Department of Methodology and Statistics, Tilburg University , Tilburg , The Netherlands.
Department of Methodology and Statistics, Tilburg University, Tilburg, The Netherlands; Department of Sociology, Utrecht University, Utrecht, The Netherlands.
PeerJ. 2016 Apr 11;4:e1935. doi: 10.7717/peerj.1935. eCollection 2016.
Previous studies provided mixed findings on pecularities in p-value distributions in psychology. This paper examined 258,050 test results across 30,710 articles from eight high impact journals to investigate the existence of a peculiar prevalence of p-values just below .05 (i.e., a bump) in the psychological literature, and a potential increase thereof over time. We indeed found evidence for a bump just below .05 in the distribution of exactly reported p-values in the journals Developmental Psychology, Journal of Applied Psychology, and Journal of Personality and Social Psychology, but the bump did not increase over the years and disappeared when using recalculated p-values. We found clear and direct evidence for the QRP "incorrect rounding of p-value" (John, Loewenstein & Prelec, 2012) in all psychology journals. Finally, we also investigated monotonic excess of p-values, an effect of certain QRPs that has been neglected in previous research, and developed two measures to detect this by modeling the distributions of statistically significant p-values. Using simulations and applying the two measures to the retrieved test results, we argue that, although one of the measures suggests the use of QRPs in psychology, it is difficult to draw general conclusions concerning QRPs based on modeling of p-value distributions.
以往的研究对心理学中p值分布的特殊性给出了不一致的结果。本文检查了来自八本高影响力期刊的30710篇文章中的258050个测试结果,以调查心理学文献中是否存在p值刚好低于0.05(即一个凸起)的特殊普遍情况,以及其随时间的潜在增加。我们确实在《发展心理学》《应用心理学杂志》和《人格与社会心理学杂志》中精确报告的p值分布中发现了低于0.05的凸起的证据,但该凸起多年来并未增加,并且在使用重新计算的p值时消失了。我们在所有心理学杂志中都发现了明确直接的证据证明存在“p值的错误舍入”这种可疑研究做法(约翰、洛温斯坦和普雷莱克,2012年)。最后,我们还研究了p值的单调过量,这是一种在以往研究中被忽视的可疑研究做法的影响,并通过对统计显著的p值分布进行建模开发了两种检测方法。通过模拟并将这两种方法应用于检索到的测试结果,我们认为,尽管其中一种方法表明在心理学中存在可疑研究做法,但基于p值分布建模很难得出关于可疑研究做法的一般性结论。