重新定义医学研究的意义和可重复性：呼吁提高诊断和预后模型的 P 值阈值。

Redefining significance and reproducibility for medical research: A plea for higher P-value thresholds for diagnostic and prognostic models.

机构信息

Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.

Department of Development and Regeneration, KU Leuven, Leuven, Belgium.

出版信息

Eur J Clin Invest. 2020 May;50(5):e13229. doi: 10.1111/eci.13229. Epub 2020 May 9.

DOI:10.1111/eci.13229

PMID:32281648

Abstract

The role of P-values for null hypothesis testing is under debate. We aim to explore the impact of the significance threshold on estimates for the strengths of associations ("effects") and the implications for different types of epidemiological research. We consider situations with normal distribution of a true effect, while varying the effect size. We confirm the occurrence of "testimation bias": estimating effect size only if the test was statistically significant leads to exaggerated results. The absolute bias is largest for true effects around 0.7 times the size of the standard error: +220% bias if effects are selected after testing with P < .05, and +335% if tested with P < .005. Less bias was found for testing with P < .20 (+130%) and larger true effect sizes. We conclude that a lower P-value threshold for declaring statistical significance implies more exaggeration in an estimated effect. This implies that if a low threshold is used, effect size estimation should not be attempted, for example in the context of selecting promising discoveries that need further validation. Confirmatory studies, such as randomized controlled trials, might stick to the 0.05 threshold if adequately powered, while prediction modelling studies should use an even higher threshold, such as 0.2, to avoid strongly biased effect estimates.

摘要

P 值在零假设检验中的作用备受争议。本研究旨在探讨显著性阈值对关联强度（“效应”）估计的影响，以及对不同类型的流行病学研究的意义。我们考虑了在真实效应呈正态分布的情况下，随着效应大小的变化，情况会如何变化。我们证实了“检验估计偏差”的存在：仅当检验具有统计学意义时才估计效应大小，会导致结果被夸大。对于接近标准误差大小 0.7 倍的真实效应，绝对偏差最大：如果在 P < 0.05 时进行检验后选择效果，则存在+220%的偏差，如果在 P < 0.005 时进行检验，则存在+335%的偏差。对于 P < 0.20 的检验（+130%）和更大的真实效应大小，偏差较小。我们得出的结论是，宣布统计学意义的 P 值阈值越低，估计效应的夸大程度就越大。这意味着如果使用较低的阈值，则不应尝试进行效应大小估计，例如在选择需要进一步验证的有前途的发现的情况下。如果充分有力，确证性研究（如随机对照试验）可以坚持使用 0.05 的阈值，而预测模型研究则应使用更高的阈值（如 0.2），以避免产生严重偏差的效应估计。