Bishop Dorothy V M, Thompson Paul A
Department of Experimental Psychology, University of Oxford , Oxford , United Kingdom.
PeerJ. 2016 Feb 18;4:e1715. doi: 10.7717/peerj.1715. eCollection 2016.
Background. The p-curve is a plot of the distribution of p-values reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. Methods. p-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results. We show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions. The absence of a bump in the p-curve is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis. In particular, p-hacking with ghost variables is likely to be missed.
背景。p曲线是一组科学研究中报告的p值分布的曲线图。p值范围之间的比较已被用于评估研究领域,包括研究具有真实证据价值的程度,以及它们在变量选择和发表分析方面遭受偏差(p值操纵)的程度。方法。p值操纵可以有多种形式。在这里,我们使用R代码模拟幽灵变量的使用,即实验者收集多个因变量的数据,但只报告那些具有统计学显著效应的变量。我们还检查了Head等人(2015年)使用的一个文本挖掘数据集,并评估了其在调查p值操纵方面的适用性。结果。我们表明,当存在幽灵p值操纵时,p曲线的形状取决于因变量是否相互关联。对于不相关的变量,模拟的p值操纵数据不会在略低于0.05处出现被视为p值操纵证据的“p值操纵凸起”,尽管当模拟变量相互关联时会有负偏态。当使用自动文本挖掘来检测已发表论文的异质集合中的p值时,p曲线根据基础数据特征的变化方式会带来问题。结论。p曲线中没有凸起并不表明不存在p值操纵。此外,虽然具有证据价值的研究通常会产生右偏的p曲线,但除非我们有一个特定于分析中输入的p值类型的模型,否则我们不能将右偏的p曲线视为证据价值程度的指标。我们得出结论,除非对分析中输入的数据类型有相当的控制,否则使用p曲线来估计p值操纵和证据价值的程度是不可行的。特别是,幽灵变量的p值操纵很可能会被遗漏。