Lazzeroni L C, Lu Y, Belitskaya-Lévy I
Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, CA, USA.
1] VA Cooperative Studies Program Palo Alto Coordinating Center, Mountain View, CA, USA [2] Department of Health Research and Policy, Stanford University School of Medicine, CA, USA.
Mol Psychiatry. 2014 Dec;19(12):1336-40. doi: 10.1038/mp.2013.184. Epub 2014 Jan 14.
Scientists often interpret P-values as measures of the relative strength of statistical findings. This is common practice in large-scale genomic studies where P-values are used to choose which of numerous hypothesis test results should be pursued in subsequent research. In this study, we examine P-value variability to assess the degree of certainty P-values provide. We develop prediction intervals for the P-value in a replication study given the P-value observed in an initial study. The intervals depend on the initial value of P and the ratio of sample sizes between the initial and replication studies, but not on the underlying effect size or initial sample size. The intervals are valid for most large-sample statistical tests in any context, and can be used in the presence of single or multiple tests. While P-values are highly variable, future P-value variability can be explicitly predicted based on a P-value from an initial study. The relative size of the replication and initial study is an important predictor of the P-value in a subsequent replication study. We provide a handy calculator implementing these results and apply them to a study of Alzheimer's disease and recent findings of the Cross-Disorder Group of the Psychiatric Genomics Consortium. This study suggests that overinterpretation of very significant, but highly variable, P-values is an important factor contributing to the unexpectedly high incidence of non-replication. Formal prediction intervals can also provide realistic interpretations and comparisons of P-values associated with different estimated effect sizes and sample sizes.
科学家们常常将P值解释为统计结果相对强度的度量。这在大规模基因组研究中是常见做法,其中P值用于选择众多假设检验结果中哪些应在后续研究中进一步探究。在本研究中,我们考察P值的变异性以评估P值所提供的确定程度。给定初始研究中观察到的P值,我们为重复研究中的P值建立预测区间。这些区间取决于P的初始值以及初始研究与重复研究之间的样本量之比,但不取决于潜在效应大小或初始样本量。这些区间在任何情况下对大多数大样本统计检验都是有效的,并且可用于单检验或多检验情形。虽然P值具有高度变异性,但未来的P值变异性可根据初始研究中的P值明确预测。重复研究与初始研究的相对规模是后续重复研究中P值的一个重要预测指标。我们提供了一个实现这些结果的便捷计算器,并将其应用于一项关于阿尔茨海默病的研究以及精神基因组学联盟跨疾病组的近期发现。本研究表明,对非常显著但高度可变的P值过度解读是导致意外高的非重复发生率的一个重要因素。形式化的预测区间还可为与不同估计效应大小和样本量相关的P值提供现实的解释和比较。