Suppr超能文献

利用HapMap和基因重测序数据进行候选基因关联研究的标签单核苷酸多态性选择

Tag SNP selection for candidate gene association studies using HapMap and gene resequencing data.

作者信息

Xu Zongli, Kaplan Norman L, Taylor Jack A

机构信息

Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.

出版信息

Eur J Hum Genet. 2007 Oct;15(10):1063-70. doi: 10.1038/sj.ejhg.5201875. Epub 2007 Jun 13.

Abstract

HapMap provides linkage disequilibrium (LD) information on a sample of 3.7 million SNPs that can be used for tag SNP selection in whole-genome association studies. HapMap can also be used for tag SNP selection in candidate genes, although its performance has yet to be evaluated against gene resequencing data, where there is near-complete SNP ascertainment. The Environmental Genome Project (EGP) is the largest gene resequencing effort to date with over 500 resequenced genes. We used HapMap data to select tag SNPs and calculated the proportions of common SNPs (MAF>or=0.05) tagged (rho2>or=0.8) for each of 127 EGP Panel 2 genes where individual ethnic information was available. Median gene-tagging proportions are 50, 80 and 74% for African, Asian, and European groups, respectively. These low gene-tagging proportions may be problematic for some candidate gene studies. In addition, although HapMap targeted nonsynonymous SNPs (nsSNPs), we estimate only approximately 30% of nonsynonymous SNPs in EGP are in high LD with any HapMap SNP. We show that gene-tagging proportions can be improved by adding a relatively small number of tag SNPs that were selected based on resequencing data. We also demonstrate that ethnic-mixed data can be used to improve HapMap gene-tagging proportions, but are not as efficient as ethnic-specific data. Finally, we generalized the greedy algorithm proposed by Carlson et al (2004) to select tag SNPs for multiple populations and implemented the algorithm into a freely available software package mPopTag.

摘要

HapMap提供了370万个单核苷酸多态性(SNP)样本的连锁不平衡(LD)信息,可用于全基因组关联研究中的标签SNP选择。HapMap也可用于候选基因中的标签SNP选择,尽管其性能尚未根据基因重测序数据进行评估,而在基因重测序数据中,SNP的确定几乎是完整的。环境基因组计划(EGP)是迄今为止最大规模的基因重测序项目,已对500多个基因进行了重测序。我们使用HapMap数据选择标签SNP,并计算了127个EGP第2组基因中每个基因(可获得个体种族信息)的常见SNP(最小等位基因频率[MAF]≥0.05)被标记(r²≥0.8)的比例。非洲、亚洲和欧洲群体的基因标记比例中位数分别为50%、80%和74%。对于一些候选基因研究而言,这些较低的基因标记比例可能存在问题。此外,尽管HapMap针对非同义SNP(nsSNP),但我们估计EGP中只有约30%的非同义SNP与任何HapMap SNP处于高度连锁不平衡状态。我们表明,通过添加相对少量基于重测序数据选择的标签SNP,可以提高基因标记比例。我们还证明,混合种族数据可用于提高HapMap基因标记比例,但不如特定种族数据有效。最后,我们推广了Carlson等人(2004年)提出的贪心算法,以选择多个群体的标签SNP,并将该算法实现为一个免费的软件包mPopTag。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验