Bruns Stephan B, Ioannidis John P A
Meta-Research in Economics Group, University of Kassel, Kassel, Germany.
Departments of Medicine, Health Research and Policy, and Statistics, and Meta-Research Innovation Center at Stanford, Stanford University, Stanford, United States of America.
PLoS One. 2016 Feb 17;11(2):e0149144. doi: 10.1371/journal.pone.0149144. eCollection 2016.
The p-curve, the distribution of statistically significant p-values of published studies, has been used to make inferences on the proportion of true effects and on the presence of p-hacking in the published literature. We analyze the p-curve for observational research in the presence of p-hacking. We show by means of simulations that even with minimal omitted-variable bias (e.g., unaccounted confounding) p-curves based on true effects and p-curves based on null-effects with p-hacking cannot be reliably distinguished. We also demonstrate this problem using as practical example the evaluation of the effect of malaria prevalence on economic growth between 1960 and 1996. These findings call recent studies into question that use the p-curve to infer that most published research findings are based on true effects in the medical literature and in a wide range of disciplines. p-values in observational research may need to be empirically calibrated to be interpretable with respect to the commonly used significance threshold of 0.05. Violations of randomization in experimental studies may also result in situations where the use of p-curves is similarly unreliable.
p曲线,即已发表研究中具有统计学显著性的p值的分布,已被用于推断真实效应的比例以及已发表文献中是否存在p值操纵行为。我们分析了存在p值操纵行为时观察性研究的p曲线。我们通过模拟表明,即使存在最小程度的遗漏变量偏差(例如,未考虑的混杂因素),基于真实效应的p曲线和基于存在p值操纵行为的零效应的p曲线也无法可靠地区分。我们还以1960年至1996年间疟疾流行率对经济增长的影响评估为例,证明了这一问题。这些发现对最近的一些研究提出了质疑,这些研究利用p曲线推断医学文献和广泛学科中大多数已发表的研究结果是基于真实效应的。观察性研究中的p值可能需要进行实证校准,以便相对于常用的0.05显著性阈值进行解释。实验研究中随机化的违反也可能导致p曲线的使用同样不可靠的情况。