基因组学中的P值：表面的精确掩盖了高度的不确定性。

P-values in genomics: apparent precision masks high uncertainty.

作者信息

Lazzeroni L C, Lu Y, Belitskaya-Lévy I

机构信息

Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, CA, USA.

1] VA Cooperative Studies Program Palo Alto Coordinating Center, Mountain View, CA, USA [2] Department of Health Research and Policy, Stanford University School of Medicine, CA, USA.

出版信息

Mol Psychiatry. 2014 Dec;19(12):1336-40. doi: 10.1038/mp.2013.184. Epub 2014 Jan 14.

DOI:10.1038/mp.2013.184

PMID:24419042

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4255087/

Abstract

Scientists often interpret P-values as measures of the relative strength of statistical findings. This is common practice in large-scale genomic studies where P-values are used to choose which of numerous hypothesis test results should be pursued in subsequent research. In this study, we examine P-value variability to assess the degree of certainty P-values provide. We develop prediction intervals for the P-value in a replication study given the P-value observed in an initial study. The intervals depend on the initial value of P and the ratio of sample sizes between the initial and replication studies, but not on the underlying effect size or initial sample size. The intervals are valid for most large-sample statistical tests in any context, and can be used in the presence of single or multiple tests. While P-values are highly variable, future P-value variability can be explicitly predicted based on a P-value from an initial study. The relative size of the replication and initial study is an important predictor of the P-value in a subsequent replication study. We provide a handy calculator implementing these results and apply them to a study of Alzheimer's disease and recent findings of the Cross-Disorder Group of the Psychiatric Genomics Consortium. This study suggests that overinterpretation of very significant, but highly variable, P-values is an important factor contributing to the unexpectedly high incidence of non-replication. Formal prediction intervals can also provide realistic interpretations and comparisons of P-values associated with different estimated effect sizes and sample sizes.

摘要

科学家们常常将P值解释为统计结果相对强度的度量。这在大规模基因组研究中是常见做法，其中P值用于选择众多假设检验结果中哪些应在后续研究中进一步探究。在本研究中，我们考察P值的变异性以评估P值所提供的确定程度。给定初始研究中观察到的P值，我们为重复研究中的P值建立预测区间。这些区间取决于P的初始值以及初始研究与重复研究之间的样本量之比，但不取决于潜在效应大小或初始样本量。这些区间在任何情况下对大多数大样本统计检验都是有效的，并且可用于单检验或多检验情形。虽然P值具有高度变异性，但未来的P值变异性可根据初始研究中的P值明确预测。重复研究与初始研究的相对规模是后续重复研究中P值的一个重要预测指标。我们提供了一个实现这些结果的便捷计算器，并将其应用于一项关于阿尔茨海默病的研究以及精神基因组学联盟跨疾病组的近期发现。本研究表明，对非常显著但高度可变的P值过度解读是导致意外高的非重复发生率的一个重要因素。形式化的预测区间还可为与不同估计效应大小和样本量相关的P值提供现实的解释和比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df9f/4255087/0495a3cd91cf/mp2013184f1.jpg

相似文献

P-values in genomics: apparent precision masks high uncertainty.基因组学中的P值：表面的精确掩盖了高度的不确定性。

Mol Psychiatry. 2014 Dec;19(12):1336-40. doi: 10.1038/mp.2013.184. Epub 2014 Jan 14.

Bayesian prediction intervals for assessing P-value variability in prospective replication studies.贝叶斯预测区间在评估前瞻性重复研究中 P 值变异性的应用。

Transl Psychiatry. 2017 Dec 8;7(12):1271. doi: 10.1038/s41398-017-0024-3.

Genomic Approaches to Posttraumatic Stress Disorder: The Psychiatric Genomic Consortium Initiative.创伤后应激障碍的基因组学方法：精神疾病基因组学联盟计划。

Biol Psychiatry. 2018 May 15;83(10):831-839. doi: 10.1016/j.biopsych.2018.01.020. Epub 2018 Feb 2.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Distinguishing true from false positives in genomic studies: p values.区分基因组研究中的真阳性和假阳性：p 值。

Eur J Epidemiol. 2013 Feb;28(2):131-8. doi: 10.1007/s10654-012-9755-x. Epub 2013 Feb 1.

Genomics and CSF analyses implicate thyroid hormone in hippocampal sclerosis of aging.基因组学和脑脊液分析表明甲状腺激素与衰老所致海马硬化有关。

Acta Neuropathol. 2016 Dec;132(6):841-858. doi: 10.1007/s00401-016-1641-2. Epub 2016 Nov 4.

Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives.随机对照试验中的亚组分析：量化假阳性和假阴性风险

Health Technol Assess. 2001;5(33):1-56. doi: 10.3310/hta5330.

Genomics of Alzheimer's disease: Value of high-throughput genomic technologies to dissect its etiology.阿尔茨海默病的基因组学：高通量基因组技术在剖析其病因方面的价值。

Mol Cell Probes. 2016 Dec;30(6):397-403. doi: 10.1016/j.mcp.2016.09.001. Epub 2016 Sep 13.

Psychiatric disease in the genomic era: rational approach.基因组时代的精神疾病：理性方法。

Mol Psychiatry. 2005 Nov;10(11):978-84. doi: 10.1038/sj.mp.4001723.

Precision of the reportable value-Statistical optimization of the number of replicates.报告值的精度-重复次数的统计优化。

J Pharm Biomed Anal. 2019 Jan 5;162:149-157. doi: 10.1016/j.jpba.2018.08.062. Epub 2018 Sep 5.

引用本文的文献

Germplasm Screening Using DNA Markers and Genome-Wide Association Study for the Identification of Powdery Mildew Resistance Loci in Tomato.利用 DNA 标记和全基因组关联研究进行番茄白粉病抗性基因座的种质筛选。

Int J Mol Sci. 2022 Nov 6;23(21):13610. doi: 10.3390/ijms232113610.

Quantifying posterior effect size distribution of susceptibility loci by common summary statistics.利用常见汇总统计量量化易感性基因座的后效大小分布。

Genet Epidemiol. 2020 Jun;44(4):339-351. doi: 10.1002/gepi.22286. Epub 2020 Feb 25.

Genome-wide Association of Endophenotypes for Schizophrenia From the Consortium on the Genetics of Schizophrenia (COGS) Study.全基因组关联研究精神分裂症内表型的精神分裂症遗传学联合会（COGS）研究。

JAMA Psychiatry. 2019 Dec 1;76(12):1274-1284. doi: 10.1001/jamapsychiatry.2019.2850.

Deep learning in medical imaging and radiation therapy.深度学习在医学影像和放射治疗中的应用。

Med Phys. 2019 Jan;46(1):e1-e36. doi: 10.1002/mp.13264. Epub 2018 Nov 20.

Bayesian prediction intervals for assessing P-value variability in prospective replication studies.贝叶斯预测区间在评估前瞻性重复研究中 P 值变异性的应用。

Transl Psychiatry. 2017 Dec 8;7(12):1271. doi: 10.1038/s41398-017-0024-3.

The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted.检验得越多，发现得越多：随着检验次数增多，最小的P值中真实发现的比例越来越高。

Genet Epidemiol. 2017 Dec;41(8):726-743. doi: 10.1002/gepi.22064. Epub 2017 Sep 14.

The earth is flat ( > 0.05): significance thresholds and the crisis of unreplicable research.地球是平的（p>0.05）：显著性阈值与不可重复研究的危机。

PeerJ. 2017 Jul 7;5:e3544. doi: 10.7717/peerj.3544. eCollection 2017.

Expected p-values in light of an ROC curve analysis applied to optimal multiple testing procedures.鉴于应用于最优多重检验程序的 ROC 曲线分析的预期 P 值。

Stat Methods Med Res. 2018 Dec;27(12):3560-3576. doi: 10.1177/0962280217704451. Epub 2017 May 15.

Enhancing genomic prediction with genome-wide association studies in multiparental maize populations.利用多亲本玉米群体中的全基因组关联研究增强基因组预测

Heredity (Edinb). 2017 Jun;118(6):585-593. doi: 10.1038/hdy.2017.4. Epub 2017 Feb 15.

Cross-Disorder Psychiatric Genomics.跨疾病精神基因组学

Curr Behav Neurosci Rep. 2016 Sep;3(3):256-263. doi: 10.1007/s40473-016-0084-3. Epub 2016 Jul 2.

本文引用的文献

Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better.复制和 p 值区间：p 值只能模糊地预测未来，但置信区间要好得多。

Perspect Psychol Sci. 2008 Jul;3(4):286-300. doi: 10.1111/j.1745-6924.2008.00079.x.

Evaluating the evidence of replication for genetic associations with schizophrenia.评估精神分裂症基因关联复制的证据。

JAMA Psychiatry. 2014 Jan;71(1):94-5. doi: 10.1001/jamapsychiatry.2013.2987.

Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis.五种主要精神疾病具有共同影响的风险基因座的鉴定：全基因组分析。

Lancet. 2013 Apr 20;381(9875):1371-1379. doi: 10.1016/S0140-6736(12)62129-1. Epub 2013 Feb 28.

A generalized Defries-Fulker regression framework for the analysis of twin data.广义 Defries-Fulker 回归框架在双生子数据分析中的应用。

Behav Genet. 2013 Jan;43(1):85-96. doi: 10.1007/s10519-012-9573-7. Epub 2012 Dec 20.

Computational tools for prioritizing candidate genes: boosting disease gene discovery.计算工具在候选基因优先级排序中的应用：提高疾病基因发现的效率。

Nat Rev Genet. 2012 Jul 3;13(8):523-36. doi: 10.1038/nrg3253.

Asking for more.要求更多。

Nat Genet. 2012 Jun 27;44(7):733. doi: 10.1038/ng.2345.

P-Value Precision and Reproducibility.P值的精确性与可重复性。

Am Stat. 2011;65(4):213-221. doi: 10.1198/tas.2011.10129. Epub 2012 Jan 24.

What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations.全基因组显著阈值应为多少？边缘遗传关联的实证复制。

Int J Epidemiol. 2012 Feb;41(1):273-86. doi: 10.1093/ije/dyr178. Epub 2011 Dec 5.

Genome-wide association with MRI atrophy measures as a quantitative trait locus for Alzheimer's disease.全基因组关联分析与 MRI 萎缩测量作为阿尔茨海默病的定量性状基因座。

Mol Psychiatry. 2011 Nov;16(11):1130-8. doi: 10.1038/mp.2010.123. Epub 2010 Nov 30.

The cost of large numbers of hypothesis tests on power, effect size and sample size.大量关于功效、效应大小和样本量的假设检验的成本。

Mol Psychiatry. 2012 Jan;17(1):108-14. doi: 10.1038/mp.2010.117. Epub 2010 Nov 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基因组学中的P值：表面的精确掩盖了高度的不确定性。

P-values in genomics: apparent precision masks high uncertainty.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献