观察性研究中的p曲线与p值操纵

p-Curve and p-Hacking in Observational Research.

作者信息

Bruns Stephan B, Ioannidis John P A

机构信息

Meta-Research in Economics Group, University of Kassel, Kassel, Germany.

Departments of Medicine, Health Research and Policy, and Statistics, and Meta-Research Innovation Center at Stanford, Stanford University, Stanford, United States of America.

出版信息

PLoS One. 2016 Feb 17;11(2):e0149144. doi: 10.1371/journal.pone.0149144. eCollection 2016.

DOI:10.1371/journal.pone.0149144

PMID:26886098

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4757561/

Abstract

The p-curve, the distribution of statistically significant p-values of published studies, has been used to make inferences on the proportion of true effects and on the presence of p-hacking in the published literature. We analyze the p-curve for observational research in the presence of p-hacking. We show by means of simulations that even with minimal omitted-variable bias (e.g., unaccounted confounding) p-curves based on true effects and p-curves based on null-effects with p-hacking cannot be reliably distinguished. We also demonstrate this problem using as practical example the evaluation of the effect of malaria prevalence on economic growth between 1960 and 1996. These findings call recent studies into question that use the p-curve to infer that most published research findings are based on true effects in the medical literature and in a wide range of disciplines. p-values in observational research may need to be empirically calibrated to be interpretable with respect to the commonly used significance threshold of 0.05. Violations of randomization in experimental studies may also result in situations where the use of p-curves is similarly unreliable.

摘要

p曲线，即已发表研究中具有统计学显著性的p值的分布，已被用于推断真实效应的比例以及已发表文献中是否存在p值操纵行为。我们分析了存在p值操纵行为时观察性研究的p曲线。我们通过模拟表明，即使存在最小程度的遗漏变量偏差（例如，未考虑的混杂因素），基于真实效应的p曲线和基于存在p值操纵行为的零效应的p曲线也无法可靠地区分。我们还以1960年至1996年间疟疾流行率对经济增长的影响评估为例，证明了这一问题。这些发现对最近的一些研究提出了质疑，这些研究利用p曲线推断医学文献和广泛学科中大多数已发表的研究结果是基于真实效应的。观察性研究中的p值可能需要进行实证校准，以便相对于常用的0.05显著性阈值进行解释。实验研究中随机化的违反也可能导致p曲线的使用同样不可靠的情况。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7360/4757561/b43ae56830ba/pone.0149144.g001.jpg

相似文献

p-Curve and p-Hacking in Observational Research.

PLoS One. 2016 Feb 17;11(2):e0149144. doi: 10.1371/journal.pone.0149144. eCollection 2016.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Is There Evidence of P-Hacking in Imaging Research?

Can Assoc Radiol J. 2023 Aug;74(3):497-507. doi: 10.1177/08465371221139418. Epub 2022 Nov 22.

Tempest in a teacup: An analysis of p-Hacking in organizational research.

PLoS One. 2023 Feb 24;18(2):e0281938. doi: 10.1371/journal.pone.0281938. eCollection 2023.

p-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results.

Perspect Psychol Sci. 2014 Nov;9(6):666-81. doi: 10.1177/1745691614553988.

Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value.

PeerJ. 2016 Feb 18;4:e1715. doi: 10.7717/peerj.1715. eCollection 2016.

Some properties of p-curves, with an application to gradual publication bias.

Psychol Methods. 2018 Sep;23(3):546-560. doi: 10.1037/met0000125. Epub 2017 Apr 20.

P-curve: a key to the file-drawer.

J Exp Psychol Gen. 2014 Apr;143(2):534-47. doi: 10.1037/a0033242. Epub 2013 Jul 15.

P-Curve Analysis of the Köhler Motivation Gain Effect in Exercise Settings: A Demonstration of a Novel Technique to Estimate Evidential Value Across Multiple Studies.

Ann Behav Med. 2021 Jun 2;55(6):543-556. doi: 10.1093/abm/kaaa080.

Accumulating evidence across studies: Consistent methods protect against false findings produced by p-hacking.

PLoS One. 2024 Aug 29;19(8):e0307999. doi: 10.1371/journal.pone.0307999. eCollection 2024.

引用本文的文献

Niche Overlap in Forest Tree Species Precludes a Positive Diversity-Productivity Relationship.

Plants (Basel). 2025 Jul 23;14(15):2271. doi: 10.3390/plants14152271.

Sports Metaresearch: An Emerging Discipline of Sport Science and Medicine.

Sports Med. 2025 Apr;55(4):845-856. doi: 10.1007/s40279-025-02181-x. Epub 2025 Apr 1.

Chrono-immunotherapy as a low-hanging fruit for cancer treatment? A call for pragmatic randomized clinical trials.

J Immunother Cancer. 2025 Mar 3;13(3):e010644. doi: 10.1136/jitc-2024-010644.

Improving Replication in Endometrial Omics: Understanding the Influence of the Menstrual Cycle.

Int J Mol Sci. 2025 Jan 20;26(2):857. doi: 10.3390/ijms26020857.

The inconsistency of p-curve: Testing its reliability using the power pose and HPA debates.

PLoS One. 2024 Jul 11;19(7):e0305193. doi: 10.1371/journal.pone.0305193. eCollection 2024.

Misstatements, misperceptions, and mistakes in controlling for covariates in observational research.

Elife. 2024 May 16;13:e82268. doi: 10.7554/eLife.82268.

Using a penalized likelihood to detect mortality deceleration.

PLoS One. 2023 Nov 16;18(11):e0294428. doi: 10.1371/journal.pone.0294428. eCollection 2023.

Sensitivity analysis for the interactive effects of internal bias and publication bias in meta-analyses.

Res Synth Methods. 2024 Jan;15(1):21-43. doi: 10.1002/jrsm.1667. Epub 2023 Sep 24.

Problems and alternatives of testing significance using null hypothesis and -value in food research.

Food Sci Biotechnol. 2023 May 30;32(11):1-9. doi: 10.1007/s10068-023-01348-4.

Reproducibility in the Social Sciences.

Annu Rev Sociol. 2022 Jul;48(1):65-85. doi: 10.1146/annurev-soc-090221-035954. Epub 2022 Apr 26.

本文引用的文献

Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value.

PeerJ. 2016 Feb 18;4:e1715. doi: 10.7717/peerj.1715. eCollection 2016.

p-hacking by post hoc selection with multiple opportunities: Detectability by skewness test?: Comment on Simonsohn, Nelson, and Simmons (2014).

J Exp Psychol Gen. 2015 Dec;144(6):1137-45. doi: 10.1037/xge0000086.

Field-wide meta-analyses of observational associations can map selective availability of risk factors and the impact of model specifications.

J Clin Epidemiol. 2016 Mar;71:58-67. doi: 10.1016/j.jclinepi.2015.09.004. Epub 2015 Sep 28.

Registration practices for observational studies on ClinicalTrials.gov indicated low adherence.

J Clin Epidemiol. 2016 Feb;70:176-82. doi: 10.1016/j.jclinepi.2015.09.009. Epub 2015 Sep 18.

PSYCHOLOGY. Estimating the reproducibility of psychological science.

Science. 2015 Aug 28;349(6251):aac4716. doi: 10.1126/science.aac4716.

Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations.

J Clin Epidemiol. 2015 Sep;68(9):1046-58. doi: 10.1016/j.jclinepi.2015.05.029. Epub 2015 Jun 6.

On the challenges of drawing conclusions from p-values just below 0.05.

PeerJ. 2015 Jul 30;3:e1142. doi: 10.7717/peerj.1142. eCollection 2015.

The extent and consequences of p-hacking in science.

PLoS Biol. 2015 Mar 13;13(3):e1002106. doi: 10.1371/journal.pbio.1002106. eCollection 2015 Mar.

A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too).

PeerJ. 2015 Jan 22;3:e733. doi: 10.7717/peerj.733. eCollection 2015.

What p-hacking really looks like: a comment on Masicampo and LaLande (2012).

Q J Exp Psychol (Hove). 2015;68(4):829-32. doi: 10.1080/17470218.2014.982664. Epub 2014 Dec 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

观察性研究中的p曲线与p值操纵

p-Curve and p-Hacking in Observational Research.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献