全基因组和重测序关联研究中所有单核苷酸多态性的同步分析。

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.

作者信息

Hoggart Clive J, Whittaker John C, De Iorio Maria, Balding David J

机构信息

Department of Epidemiology and Public Health, Imperial College, London, United Kingdom.

出版信息

PLoS Genet. 2008 Jul 25;4(7):e1000130. doi: 10.1371/journal.pgen.1000130.

DOI:10.1371/journal.pgen.1000130

PMID:18654633

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2464715/

Abstract

Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.

摘要

一次检测一个单核苷酸多态性（SNP）并不能充分发挥全基因组关联研究的潜力，以识别多个致病变异，而对于许多复杂疾病来说，这是一种合理的情况。我们表明，由于随机搜索方法的发展，现在可以对全基因组研究中的所有SNP进行同时分析，以识别最能预测疾病结局的子集。我们使用了一种受贝叶斯启发的惩罚最大似然方法，其中每个SNP都可以考虑对疾病风险的加性、显性和隐性贡献。获得了回归系数的后验模式估计值，每个估计值都被赋予了一个在零处有尖锐模式的先验。非零系数估计值被解释为对应于一个显著的SNP。我们研究了两种先验分布，并表明与单SNP检验相比，正态-指数-伽马先验导致了更好的SNP选择。我们还推导了一个明确的I型错误近似值，避免了使用置换程序的需要。除了全基因组分析外，我们的方法非常适合于对通过重测序和/或推断获得的非常密集的SNP集进行精细定位。它可以适应定量以及病例对照表型、协变量调整，并且可以扩展到搜索相互作用。在这里，我们使用多达50万个SNP的模拟病例对照数据集、一个30万个SNP的真实全基因组数据集和一个基于序列的数据集来证明我们方法的功效和经验性I型错误，每个数据集在台式工作站上只需几个小时就能进行分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7021/2464715/5f8e32eda30f/pgen.1000130.g001.jpg

相似文献

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.全基因组和重测序关联研究中所有单核苷酸多态性的同步分析。

PLoS Genet. 2008 Jul 25;4(7):e1000130. doi: 10.1371/journal.pgen.1000130.

Analysis of untyped SNPs: maximum likelihood and imputation methods.非分型单核苷酸多态性分析：最大似然法和推断方法。

Genet Epidemiol. 2010 Dec;34(8):803-15. doi: 10.1002/gepi.20527.

ATRIUM: testing untyped SNPs in case-control association studies with related individuals.心房：在与相关个体的病例对照关联研究中测试无类型单核苷酸多态性。

Am J Hum Genet. 2009 Nov;85(5):667-78. doi: 10.1016/j.ajhg.2009.10.006.

Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.未分型标记的全基因组推断准确性及其对关联研究统计效能的影响。

BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies.一种结合基于随机森林的技术和通过潜在变量进行连锁不平衡建模的方法，用于进行多基因座全基因组关联研究。

BMC Bioinformatics. 2018 Mar 27;19(1):106. doi: 10.1186/s12859-018-2054-0.

Bayesian estimates of linkage disequilibrium.连锁不平衡的贝叶斯估计。

BMC Genet. 2007 Jun 25;8:36. doi: 10.1186/1471-2156-8-36.

Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms.通过收集额外的单核苷酸多态性来提高全基因组关联研究的效力。

Genetics. 2011 Jun;188(2):449-60. doi: 10.1534/genetics.111.128595. Epub 2011 Apr 5.

Fine mapping of disease genes using tagging SNPs.利用标签单核苷酸多态性对疾病基因进行精细定位。

Ann Hum Genet. 2007 Nov;71(Pt 6):815-27. doi: 10.1111/j.1469-1809.2007.00379.x. Epub 2007 Jun 22.

Bayesian epistasis association mapping via SNP imputation.贝叶斯上位性关联映射通过 SNP 插补。

Biostatistics. 2011 Apr;12(2):211-22. doi: 10.1093/biostatistics/kxq063. Epub 2010 Oct 5.

Power-based, phase-informed selection of single nucleotide polymorphisms for disease association screens.基于功效、相位信息的单核苷酸多态性选择用于疾病关联筛查。

Genet Epidemiol. 2006 Sep;30(6):459-70. doi: 10.1002/gepi.20159.

引用本文的文献

Functional Validation of Noncoding Variants Associated With Nonsyndromic Orofacial Cleft.与非综合征性口腔颌面部裂隙相关的非编码变异的功能验证

Hum Mutat. 2025 Aug 28;2025:6824122. doi: 10.1155/humu/6824122. eCollection 2025.

Machine learning reveals complex genetics of fungal resistance in sorghum grain mold.机器学习揭示了高粱粒腐病真菌抗性的复杂遗传学。

Heredity (Edinb). 2025 Jul 19. doi: 10.1038/s41437-025-00783-9.

Recent Statistical Innovations in Human Genetics.人类遗传学领域的最新统计创新

Ann Hum Genet. 2025 Sep;89(5):241-254. doi: 10.1111/ahg.12606. Epub 2025 Jun 27.

HighDimMixedModels.jl: Robust high-dimensional mixed-effects models across omics data.HighDimMixedModels.jl：跨组学数据的稳健高维混合效应模型。

PLoS Comput Biol. 2025 Jan 13;21(1):e1012143. doi: 10.1371/journal.pcbi.1012143. eCollection 2025 Jan.

BLUPmrMLM: A Fast mrMLM Algorithm in Genome-wide Association Studies.BLUPmrMLM：全基因组关联研究中的一种快速 mrMLM 算法。

Genomics Proteomics Bioinformatics. 2024 Sep 13;22(3). doi: 10.1093/gpbjnl/qzae020.

Improving power of genome-wide association studies via transforming ordinal phenotypes into continuous phenotypes.通过将有序表型转化为连续表型提高全基因组关联研究的效能

Front Plant Sci. 2023 Nov 2;14:1247181. doi: 10.3389/fpls.2023.1247181. eCollection 2023.

High-dimensional supervised classification in a context of non-independence of observations to identify the determining SNPs in a phenotype.在观测值非独立的情况下进行高维监督分类，以识别表型中的决定性单核苷酸多态性。

Infect Dis Model. 2023 Sep 9;8(4):1079-1087. doi: 10.1016/j.idm.2023.09.002. eCollection 2023 Dec.

Fast and flexible joint fine-mapping of multiple traits via the Sum of Single Effects model.通过单效应模型的总和实现多个性状的快速灵活联合精细定位。

bioRxiv. 2024 Jun 18:2023.04.14.536893. doi: 10.1101/2023.04.14.536893.

Identification of Driver Epistatic Gene Pairs Combining Germline and Somatic Mutations in Cancer.鉴定癌症中胚系和体细胞突变结合的驱动突变基因对。

Int J Mol Sci. 2023 May 26;24(11):9323. doi: 10.3390/ijms24119323.

A simple new approach to variable selection in regression, with application to genetic fine mapping.一种用于回归中变量选择的简单新方法及其在基因精细定位中的应用。

J R Stat Soc Series B Stat Methodol. 2020 Dec;82(5):1273-1300. doi: 10.1111/rssb.12388. Epub 2020 Jul 10.

本文引用的文献

Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms.关联研究模式的转变：罕见单核苷酸多态性的价值

Am J Hum Genet. 2008 Jan;82(1):100-12. doi: 10.1016/j.ajhg.2007.09.006.

Sequence-level population simulations over large genomic regions.大型基因组区域的序列水平群体模拟。

Genetics. 2007 Nov;177(3):1725-31. doi: 10.1534/genetics.106.069088. Epub 2007 Oct 18.

Prediction of individual genetic risk to disease from genome-wide association studies.基于全基因组关联研究预测个体疾病遗传风险

Genome Res. 2007 Oct;17(10):1520-8. doi: 10.1101/gr.6665407. Epub 2007 Sep 4.

Imputation-based analysis of association studies: candidate regions and quantitative traits.基于归因的关联研究分析：候选区域和数量性状

PLoS Genet. 2007 Jul;3(7):e114. doi: 10.1371/journal.pgen.0030114. Epub 2007 May 30.

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.对14000例七种常见疾病患者及3000例共享对照进行全基因组关联研究。

Nature. 2007 Jun 7;447(7145):661-78. doi: 10.1038/nature05911.

Prediction of coronary heart disease risk using a genetic risk score: the Atherosclerosis Risk in Communities Study.使用遗传风险评分预测冠心病风险：社区动脉粥样硬化风险研究

Am J Epidemiol. 2007 Jul 1;166(1):28-35. doi: 10.1093/aje/kwm060. Epub 2007 Apr 18.

A genome-wide association study identifies novel risk loci for type 2 diabetes.一项全基因组关联研究确定了2型糖尿病的新风险位点。

Nature. 2007 Feb 22;445(7130):881-5. doi: 10.1038/nature05616. Epub 2007 Feb 11.

Population structure and eigenanalysis.群体结构与特征分析

PLoS Genet. 2006 Dec;2(12):e190. doi: 10.1371/journal.pgen.0020190.

A Bayesian toolkit for genetic association studies.用于基因关联研究的贝叶斯工具包。

Genet Epidemiol. 2006 Apr;30(3):231-47. doi: 10.1002/gepi.20140.

Calibrating a coalescent simulation of human genome sequence variation.校准人类基因组序列变异的合并模拟。

Genome Res. 2005 Nov;15(11):1576-83. doi: 10.1101/gr.3709305.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

全基因组和重测序关联研究中所有单核苷酸多态性的同步分析。

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献