Suppr超能文献

多个群体中标签单核苷酸多态性的高效选择

Efficient selection of tagging single-nucleotide polymorphisms in multiple populations.

作者信息

Howie Bryan N, Carlson Christopher S, Rieder Mark J, Nickerson Deborah A

机构信息

Department of Genome Sciences, University of Washington, Box 357730, Seattle, WA 98195, USA.

出版信息

Hum Genet. 2006 Aug;120(1):58-68. doi: 10.1007/s00439-006-0182-5. Epub 2006 May 6.

Abstract

Common genetic polymorphism may explain a portion of the heritable risk for common diseases, so considerable effort has been devoted to finding and typing common single-nucleotide polymorphisms (SNPs) in the human genome. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), suggesting that only a subset of all SNPs (known as tagging SNPs, or tagSNPs) need to be genotyped for disease association studies. Based on the genetic differences that exist among human populations, most tagSNP sets are defined in a single population and applied only in populations that are closely related. To improve the efficiency of multi-population analyses, we have developed an algorithm called MultiPop-TagSelect that finds a near-minimal union of population-specific tagSNP sets across an arbitrary number of populations. We present this approach as an extension of LD-select, a tagSNP selection method that uses a greedy algorithm to group SNPs into bins based on their pairwise association patterns, although the MultiPop-TagSelect algorithm could be used with any SNP tagging approach that allows choices between nearly equivalent SNPs. We evaluate the algorithm by considering tagSNP selection in candidate-gene resequencing data and lower density whole-chromosome data. Our analysis reveals that an exhaustive search is often intractable, while the developed algorithm can quickly and reliably find near-optimal solutions even for difficult tagSNP selection problems. Using populations of African, Asian, and European ancestry, we also show that an optimal multi-population set of tagSNPs can be substantially smaller (up to 44%) than a typical set obtained through independent or sequential selection.

摘要

常见的基因多态性可能解释了常见疾病的部分遗传风险,因此人们投入了大量精力在人类基因组中寻找和分型常见的单核苷酸多态性(SNP)。许多SNP表现出相关的基因型,即连锁不平衡(LD),这表明在疾病关联研究中,只需对所有SNP中的一个子集(称为标签SNP,或tagSNP)进行基因分型。基于人类群体之间存在的遗传差异,大多数tagSNP集是在单一群体中定义的,并且仅应用于密切相关的群体。为了提高多群体分析的效率,我们开发了一种名为MultiPop-TagSelect的算法,该算法可以在任意数量的群体中找到特定群体tagSNP集的近乎最小并集。我们将此方法作为LD-select的扩展提出,LD-select是一种tagSNP选择方法,它使用贪心算法根据SNP的成对关联模式将SNP分组到不同的bin中,尽管MultiPop-TagSelect算法可以与任何允许在近乎等效的SNP之间进行选择的SNP标签方法一起使用。我们通过考虑候选基因重测序数据和低密度全染色体数据中的tagSNP选择来评估该算法。我们的分析表明,穷举搜索通常难以处理,而开发的算法即使对于困难的tagSNP选择问题也能快速可靠地找到近乎最优的解决方案。使用非洲、亚洲和欧洲血统的群体,我们还表明,最优的多群体tagSNP集可能比通过独立或顺序选择获得的典型集小得多(高达44%)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验