重新评估多种测试策略，以提高全基因组关联研究的效率。

Re-assessment of multiple testing strategies for more efficient genome-wide association studies.

机构信息

Risk Analysis Research Center, The Institute of Statistical Mathematics, Tachikawa, Tokyo, 190-8562, Japan.

Department of Data Science, The Institute of Statistical Mathematics, Tachikawa, Tokyo, 190-8562, Japan.

出版信息

Eur J Hum Genet. 2018 Jul;26(7):1038-1048. doi: 10.1038/s41431-018-0125-3. Epub 2018 Mar 9.

DOI:10.1038/s41431-018-0125-3

PMID:29523830

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6018732/

Abstract

Although enormous costs have been dedicated to discovering relevant disease-related genetic variants, especially in genome-wide association studies (GWASs), only a small fraction of estimated heritability can be explained by these results. This is the so-called missing heritability problem. The conventional use of overly conservative multiple testing strategies based on controlling the familywise error rate (FWER), in particular with a genome-wide significance threshold of P <5 × 10, is one of the most important issues from a statistical perspective. To help resolve this problem, we performed comprehensive re-assessments of currently available strategies using recently published, extremely large-scale GWAS data sets of rheumatoid arthritis and schizophrenia (>50,000 subjects). The estimates of statistical power averaged for all disease-related genetic variants of the standard FWER-based strategy were only 0.09% for the rheumatoid arthritis data and 0.04% for the schizophrenia data. To design more efficient strategies, we also conducted an extensive comparison of multiple testing strategies by applying false discovery rate (FDR)-controlling procedures to these data sets and simulations, and found that the FDR-based procedures achieved higher power than the FWER-based strategy, even at a strict FDR level (e.g., FDR = 1%). We also discuss a useful alternative measure, namely "partial power," which is an averaged power for detecting the clinically and biologically meaningful genetic factors with the largest effects. Simulation results suggest that the FDR-based procedures can achieve sufficient partial power (>80%) for detecting these factors (odds ratios of >1.05) with 80,000 subjects, and thus this may be a useful measure for defining realistic objectives of future GWASs.

摘要

尽管已经投入了大量成本来发现相关的疾病相关遗传变异，尤其是在全基因组关联研究（GWAS）中，但这些结果只能解释估计遗传率的一小部分。这就是所谓的遗传缺失问题。从统计学角度来看，传统上使用基于控制总体错误率（FWER）的过度保守的多重检验策略，尤其是具有 5×10-8 的全基因组显着性阈值，是最重要的问题之一。为了帮助解决这个问题，我们使用最近发表的、规模极大的类风湿关节炎和精神分裂症 GWAS 数据集（>50,000 个样本），对目前可用的策略进行了全面重新评估。基于标准 FWER 的策略针对所有疾病相关遗传变异的统计功效估计值，对于类风湿关节炎数据仅为 0.09%，对于精神分裂症数据仅为 0.04%。为了设计更有效的策略，我们还通过将 FDR 控制程序应用于这些数据集和模拟，对多种多重检验策略进行了广泛比较，并发现 FDR 控制程序比基于 FWER 的策略具有更高的功效，即使在严格的 FDR 水平（例如，FDR=1%）也是如此。我们还讨论了一种有用的替代度量标准，即“部分功效”，这是检测具有最大影响的临床和生物学上有意义的遗传因素的平均功效。模拟结果表明，基于 FDR 的程序可以在 80,000 个样本中实现足够的部分功效（>80%），以检测这些因素（优势比>1.05），因此这可能是定义未来 GWAS 实际目标的有用度量标准。

相似文献

Re-assessment of multiple testing strategies for more efficient genome-wide association studies.重新评估多种测试策略，以提高全基因组关联研究的效率。

Eur J Hum Genet. 2018 Jul;26(7):1038-1048. doi: 10.1038/s41431-018-0125-3. Epub 2018 Mar 9.

A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL.一种用于全基因组关联研究中泛化测试的强大统计框架，并应用于西班牙裔社区健康研究/拉丁裔研究（HCHS/SOL）。

Genet Epidemiol. 2017 Apr;41(3):251-258. doi: 10.1002/gepi.22029. Epub 2017 Jan 15.

Weighted multiple testing procedures in genome-wide association studies.全基因组关联研究中的加权多重检验程序。

PeerJ. 2023 Jun 15;11:e15369. doi: 10.7717/peerj.15369. eCollection 2023.

Multiple testing in genome-wide association studies via hidden Markov models.基于隐马尔可夫模型的全基因组关联研究中的多重检验。

Bioinformatics. 2009 Nov 1;25(21):2802-8. doi: 10.1093/bioinformatics/btp476. Epub 2009 Aug 4.

Power and type I error rate of false discovery rate approaches in genome-wide association studies.全基因组关联研究中错误发现率方法的功效和Ⅰ型错误率。

BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S134. doi: 10.1186/1471-2156-6-S1-S134.

Hidden Markov models for controlling false discovery rate in genome-wide association analysis.用于全基因组关联分析中控制错误发现率的隐马尔可夫模型

Methods Mol Biol. 2012;802:337-44. doi: 10.1007/978-1-61779-400-1_22.

A mixed model reduces spurious genetic associations produced by population stratification in genome-wide association studies.混合模型可减少全基因组关联研究中群体分层产生的虚假遗传关联。

Genomics. 2015 Apr;105(4):191-6. doi: 10.1016/j.ygeno.2015.01.006. Epub 2015 Jan 30.

Glutamate Networks Implicate Cognitive Impairments in Schizophrenia: Genome-Wide Association Studies of 52 Cognitive Phenotypes.谷氨酸能网络与精神分裂症的认知障碍有关：52种认知表型的全基因组关联研究

Schizophr Bull. 2015 Jul;41(4):909-18. doi: 10.1093/schbul/sbu171. Epub 2014 Dec 22.

LPG: A four-group probabilistic approach to leveraging pleiotropy in genome-wide association studies.LPG：一种在全基因组关联研究中利用多效性的四组概率方法。

BMC Genomics. 2018 Jun 28;19(1):503. doi: 10.1186/s12864-018-4851-2.

Improving power of genome-wide association studies with weighted false discovery rate control and prioritized subset analysis.利用加权假发现率控制和优先子集分析提高全基因组关联研究的效能。

PLoS One. 2012;7(4):e33716. doi: 10.1371/journal.pone.0033716. Epub 2012 Apr 9.

引用本文的文献

Weighted multiple testing procedures in genome-wide association studies.全基因组关联研究中的加权多重检验程序。

PeerJ. 2023 Jun 15;11:e15369. doi: 10.7717/peerj.15369. eCollection 2023.

Construction of a multiple-class classifier based on mRNAs and lncRNA FAM66A and PSORS1C3 for predicting distant metastasis in lung adenocarcinoma.基于mRNA以及长链非编码RNA FAM66A和PSORS1C3构建多类别分类器以预测肺腺癌远处转移

Ann Transl Med. 2022 Oct;10(20):1129. doi: 10.21037/atm-22-4651.

Role of Damage-Associated Molecular Patterns in Light of Modern Environmental Research: A Tautological Approach.从现代环境研究角度看损伤相关分子模式的作用：一种同义反复的方法。

Int J Environ Res. 2020;14(5):583-604. doi: 10.1007/s41742-020-00276-z. Epub 2020 Aug 9.

Improving predictive models for Alzheimer's disease using GWAS data by incorporating misclassified samples modeling.利用 GWAS 数据通过纳入误分类样本建模改善阿尔茨海默病预测模型。

PLoS One. 2020 Apr 23;15(4):e0232103. doi: 10.1371/journal.pone.0232103. eCollection 2020.

Polygenic Risk Score Contribution to Psychosis Prediction in a Target Population of Persons at Clinical High Risk.多基因风险评分对临床高风险人群精神分裂症预测的贡献。

Am J Psychiatry. 2020 Feb 1;177(2):155-163. doi: 10.1176/appi.ajp.2019.18060721. Epub 2019 Nov 12.

本文引用的文献

Empirical Bayes Estimation of Semi-parametric Hierarchical Mixture Models for Unbiased Characterization of Polygenic Disease Architectures.用于无偏表征多基因疾病结构的半参数分层混合模型的经验贝叶斯估计

Front Genet. 2018 Apr 24;9:115. doi: 10.3389/fgene.2018.00115. eCollection 2018.

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Biological insights from 108 schizophrenia-associated genetic loci.108 个精神分裂症相关遗传位点的生物学见解。

Nature. 2014 Jul 24;511(7510):421-7. doi: 10.1038/nature13595. Epub 2014 Jul 22.

Genetics of rheumatoid arthritis contributes to biology and drug discovery.类风湿关节炎的遗传学研究有助于生物学和药物发现。

Nature. 2014 Feb 20;506(7488):376-81. doi: 10.1038/nature12873. Epub 2013 Dec 25.

The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.NHGRI GWAS Catalog，一个经过精心策划的 SNP 与特征关联资源。

Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6. doi: 10.1093/nar/gkt1229. Epub 2013 Dec 6.

Genome-wide association analysis identifies 13 new risk loci for schizophrenia.全基因组关联分析确定了 13 个精神分裂症的新风险位点。

Nat Genet. 2013 Oct;45(10):1150-9. doi: 10.1038/ng.2742. Epub 2013 Aug 25.

Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis.贝叶斯推断分析类风湿关节炎的多基因结构。

Nat Genet. 2012 Mar 25;44(5):483-9. doi: 10.1038/ng.2232.

The optimal discovery procedure in multiple significance testing: an empirical Bayes approach.多重检验中最优的发现程序：经验贝叶斯方法。

Stat Med. 2012 Jan 30;31(2):165-76. doi: 10.1002/sim.4375. Epub 2011 Oct 4.

Genome-wide association study identifies five new schizophrenia loci.全基因组关联研究确定了五个新的精神分裂症易感基因位点。

Nat Genet. 2011 Sep 18;43(10):969-76. doi: 10.1038/ng.940.

Estimating effect sizes of differentially expressed genes for power and sample-size assessments in microarray experiments.在微阵列实验中估计差异表达基因的效应大小以进行功效和样本量评估。

Biometrics. 2011 Dec;67(4):1225-35. doi: 10.1111/j.1541-0420.2011.01618.x. Epub 2011 May 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验