• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

条件随机场在快速、大规模全基因组关联研究中的应用。

Conditional random fields for fast, large-scale genome-wide association studies.

机构信息

Microsoft Research, Redmond, Washington, United States of America.

出版信息

PLoS One. 2011;6(7):e21591. doi: 10.1371/journal.pone.0021591. Epub 2011 Jul 12.

DOI:10.1371/journal.pone.0021591
PMID:21765897
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3134455/
Abstract

Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science.

摘要

理解遗传变异在人类疾病中的作用仍然是基因组学中一个有待解决的重要问题。这种变异的一个重要组成部分是 DNA 中单一位点的变异,或单核苷酸多态性 (SNP)。通常,将特定 SNP 与表型相关联的问题受到隐藏因素的混淆,例如所分析个体样本中存在群体结构、家族结构或隐性亲缘关系。这些混杂因素导致大量虚假关联和遗漏关联。已经提出了各种统计方法来解释这些混杂因素,例如线性混合效应模型 (LMM) 或基于主成分分析 (PCA) 调整数据的方法,但这些方法要么存在低功效问题,要么在样本中个体数量较大时变得难以处理。在这里,我们提出了一种用于进行全基因组关联研究 (GWAS) 的统计模型,该模型可以解释这些混杂因素。与基于 LMM 和 PCA 的方法相比,我们的方法在对从广义 LMM 生成的合成数据进行测试时,其运行时间与个体数量呈二次关系,仅略微降低了统计功效。将我们的方法应用于真实和合成人类基因型/表型数据,我们证明了我们的模型在纠正混杂因素的同时,相对于 LMM 能够显著减少运行时间的能力。我们已经实现了适合这些模型的方法,可在 http://www.microsoft.com/science 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/aad97af2b3f9/pone.0021591.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/81f04e305adc/pone.0021591.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/882afe81a023/pone.0021591.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/c39caabb68a9/pone.0021591.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/247256c9092b/pone.0021591.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/fc0ede317451/pone.0021591.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/9e11de0d1e92/pone.0021591.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/39de6954449d/pone.0021591.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/aad97af2b3f9/pone.0021591.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/81f04e305adc/pone.0021591.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/882afe81a023/pone.0021591.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/c39caabb68a9/pone.0021591.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/247256c9092b/pone.0021591.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/fc0ede317451/pone.0021591.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/9e11de0d1e92/pone.0021591.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/39de6954449d/pone.0021591.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/771e/3134455/aad97af2b3f9/pone.0021591.g008.jpg

相似文献

1
Conditional random fields for fast, large-scale genome-wide association studies.条件随机场在快速、大规模全基因组关联研究中的应用。
PLoS One. 2011;6(7):e21591. doi: 10.1371/journal.pone.0021591. Epub 2011 Jul 12.
2
Further improvements to linear mixed models for genome-wide association studies.全基因组关联研究线性混合模型的进一步改进。
Sci Rep. 2014 Nov 12;4:6874. doi: 10.1038/srep06874.
3
Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements?结构化样本关联分析中的主成分回归与线性混合模型:竞争对手还是互补方法?
Genet Epidemiol. 2015 Mar;39(3):149-55. doi: 10.1002/gepi.21879. Epub 2014 Dec 23.
4
Correction for hidden confounders in the genetic analysis of gene expression.校正基因表达遗传分析中的隐藏混杂因素。
Proc Natl Acad Sci U S A. 2010 Sep 21;107(38):16465-70. doi: 10.1073/pnas.1002425107. Epub 2010 Sep 1.
5
Hybrid of Restricted and Penalized Maximum Likelihood Method for Efficient Genome-Wide Association Study.基于受限极大似然和惩罚极大似然法的高效全基因组关联研究混合方法
Genes (Basel). 2020 Oct 29;11(11):1286. doi: 10.3390/genes11111286.
6
Limitations of principal components in quantitative genetic association models for human studies.主成分在人类研究定量遗传关联模型中的局限性。
Elife. 2023 May 4;12:e79238. doi: 10.7554/eLife.79238.
7
Efficient permutation-based genome-wide association studies for normal and skewed phenotypic distributions.高效的基于排列的全基因组关联研究,适用于正态和偏态表型分布。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii5-ii12. doi: 10.1093/bioinformatics/btac455.
8
Multiplex confounding factor correction for genomic association mapping with squared sparse linear mixed model.基于二次稀疏线性混合模型的基因组关联作图的多元混杂因素校正。
Methods. 2018 Aug 1;145:33-40. doi: 10.1016/j.ymeth.2018.04.020. Epub 2018 Apr 27.
9
Comparison of methods to account for relatedness in genome-wide association studies with family-based data.在基于家系数据的全基因组关联研究中考虑亲缘关系的方法比较。
PLoS Genet. 2014 Jul 17;10(7):e1004445. doi: 10.1371/journal.pgen.1004445. eCollection 2014 Jul.
10
A Lasso multi-marker mixed model for association mapping with population structure correction.带有群体结构校正的关联作图的套索多标记混合模型。
Bioinformatics. 2013 Jan 15;29(2):206-14. doi: 10.1093/bioinformatics/bts669. Epub 2012 Nov 22.

本文引用的文献

1
Correction for hidden confounders in the genetic analysis of gene expression.校正基因表达遗传分析中的隐藏混杂因素。
Proc Natl Acad Sci U S A. 2010 Sep 21;107(38):16465-70. doi: 10.1073/pnas.1002425107. Epub 2010 Sep 1.
2
Genome-wide association study identifies a sequence variant within the DAB2IP gene conferring susceptibility to abdominal aortic aneurysm.全基因组关联研究发现,DAB2IP 基因内的一个序列变异与腹主动脉瘤易感性相关。
Nat Genet. 2010 Aug;42(8):692-7. doi: 10.1038/ng.622. Epub 2010 Jul 11.
3
New approaches to population stratification in genome-wide association studies.
全基因组关联研究中群体分层的新方法。
Nat Rev Genet. 2010 Jul;11(7):459-63. doi: 10.1038/nrg2813.
4
Mixed linear model approach adapted for genome-wide association studies.混合线性模型方法适用于全基因组关联研究。
Nat Genet. 2010 Apr;42(4):355-60. doi: 10.1038/ng.546. Epub 2010 Mar 7.
5
Genome-wide association study identifies sequence variants on 6q21 associated with age at menarche.全基因组关联研究鉴定出与初潮年龄相关的 6q21 上的序列变异。
Nat Genet. 2009 Jun;41(6):734-8. doi: 10.1038/ng.383. Epub 2009 May 17.
6
Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity.全基因组关联研究在七个与肥胖指标相关的基因座上发现了新的序列变异。
Nat Genet. 2009 Jan;41(1):18-24. doi: 10.1038/ng.274. Epub 2008 Dec 14.
7
Efficient control of population structure in model organism association mapping.模式生物关联作图中群体结构的有效控制
Genetics. 2008 Mar;178(3):1709-23. doi: 10.1534/genetics.107.080101.
8
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.对14000例七种常见疾病患者及3000例共享对照进行全基因组关联研究。
Nature. 2007 Jun 7;447(7145):661-78. doi: 10.1038/nature05911.
9
Fenofibrate effect on triglyceride and postprandial response of apolipoprotein A5 variants: the GOLDN study.非诺贝特对载脂蛋白A5变体的甘油三酯及餐后反应的影响:GOLDN研究
Arterioscler Thromb Vasc Biol. 2007 Jun;27(6):1417-25. doi: 10.1161/ATVBAHA.107.140103. Epub 2007 Apr 12.
10
A tutorial on statistical methods for population association studies.群体关联研究统计方法教程。
Nat Rev Genet. 2006 Oct;7(10):781-91. doi: 10.1038/nrg1916.