Suppr超能文献

在关联分析中利用基因组结构

Exploiting genome structure in association analysis.

作者信息

Kim Seyoung, Xing Eric P

机构信息

School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania.

出版信息

J Comput Biol. 2014 Apr;21(4):345-60. doi: 10.1089/cmb.2009.0224. Epub 2011 May 6.

Abstract

A genome-wide association study involves examining a large number of single-nucleotide polymorphisms (SNPs) to identify SNPs that are significantly associated with the given phenotype, while trying to reduce the false positive rate. Although haplotype-based association methods have been proposed to accommodate correlation information across nearby SNPs that are in linkage disequilibrium, none of these methods directly incorporated the structural information such as recombination events along chromosome. In this paper, we propose a new approach called stochastic block lasso for association mapping that exploits prior knowledge on linkage disequilibrium structure in the genome such as recombination rates and distances between adjacent SNPs in order to increase the power of detecting true associations while reducing false positives. Following a typical linear regression framework with the genotypes as inputs and the phenotype as output, our proposed method employs a sparsity-enforcing Laplacian prior for the regression coefficients, augmented by a first-order Markov process along the sequence of SNPs that incorporates the prior information on the linkage disequilibrium structure. The Markov-chain prior models the structural dependencies between a pair of adjacent SNPs, and allows us to look for association SNPs in a coupled manner, combining strength from multiple nearby SNPs. Our results on HapMap-simulated datasets and mouse datasets show that there is a significant advantage in incorporating the prior knowledge on linkage disequilibrium structure for marker identification under whole-genome association.

摘要

全基因组关联研究涉及检查大量单核苷酸多态性(SNP),以识别与给定表型显著相关的SNP,同时试图降低假阳性率。尽管已经提出了基于单倍型的关联方法来处理处于连锁不平衡状态的相邻SNP之间的相关信息,但这些方法都没有直接纳入诸如沿染色体的重组事件等结构信息。在本文中,我们提出了一种称为随机块套索的新方法用于关联定位,该方法利用基因组中连锁不平衡结构的先验知识,如重组率和相邻SNP之间的距离,以提高检测真实关联的能力,同时减少假阳性。遵循以基因型为输入、表型为输出的典型线性回归框架,我们提出的方法对回归系数采用了一种强制稀疏的拉普拉斯先验,并通过沿SNP序列的一阶马尔可夫过程进行增强,该过程纳入了连锁不平衡结构的先验信息。马尔可夫链先验对一对相邻SNP之间的结构依赖性进行建模,并允许我们以耦合的方式寻找关联SNP,结合来自多个相邻SNP的强度。我们在HapMap模拟数据集和小鼠数据集上的结果表明,在全基因组关联下纳入连锁不平衡结构的先验知识用于标记识别具有显著优势。

相似文献

1
Exploiting genome structure in association analysis.在关联分析中利用基因组结构
J Comput Biol. 2014 Apr;21(4):345-60. doi: 10.1089/cmb.2009.0224. Epub 2011 May 6.
6
Fine mapping of disease genes using tagging SNPs.利用标签单核苷酸多态性对疾病基因进行精细定位。
Ann Hum Genet. 2007 Nov;71(Pt 6):815-27. doi: 10.1111/j.1469-1809.2007.00379.x. Epub 2007 Jun 22.
8
Bayesian estimates of linkage disequilibrium.连锁不平衡的贝叶斯估计。
BMC Genet. 2007 Jun 25;8:36. doi: 10.1186/1471-2156-8-36.

本文引用的文献

1
Genome-wide association analysis by lasso penalized logistic regression.基于套索惩罚逻辑回归的全基因组关联分析。
Bioinformatics. 2009 Mar 15;25(6):714-21. doi: 10.1093/bioinformatics/btp041. Epub 2009 Jan 28.
2
Bayesian LASSO for quantitative trait loci mapping.用于数量性状基因座定位的贝叶斯套索法
Genetics. 2008 Jun;179(2):1045-55. doi: 10.1534/genetics.107.085589. Epub 2008 May 27.
3
Detecting disease-causing genes by LASSO-Patternsearch algorithm.利用LASSO模式搜索算法检测致病基因。
BMC Proc. 2007;1 Suppl 1(Suppl 1):S60. doi: 10.1186/1753-6561-1-s1-s60. Epub 2007 Dec 18.
7
Forty mouse strain survey of water and sodium intake.对四十种小鼠品系的水和钠摄入量进行的调查。
Physiol Behav. 2007 Aug 15;91(5):620-31. doi: 10.1016/j.physbeh.2007.03.025. Epub 2007 Apr 1.
9
Leveraging the HapMap correlation structure in association studies.在关联研究中利用HapMap关联结构。
Am J Hum Genet. 2007 Apr;80(4):683-91. doi: 10.1086/513109. Epub 2007 Mar 2.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验