Suppr超能文献

考虑插补的标签单核苷酸多态性选择以提高大规模多民族关联研究的效能

Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies.

作者信息

Wojcik Genevieve L, Fuchsberger Christian, Taliun Daniel, Welch Ryan, Martin Alicia R, Shringarpure Suyash, Carlson Christopher S, Abecasis Goncalo, Kang Hyun Min, Boehnke Michael, Bustamante Carlos D, Gignoux Christopher R, Kenny Eimear E

机构信息

Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305.

Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109.

出版信息

G3 (Bethesda). 2018 Oct 3;8(10):3255-3267. doi: 10.1534/g3.118.200502.

Abstract

The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance mean imputed r at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.

摘要

基因组研究中出现的超大型队列推动了对基因型填补策略的关注,以助力罕见变异关联研究。这些策略受益于填补方法和关联检验的改进,然而,人们很少关注阵列设计能够提高罕见变异关联效力的方式。因此,我们开发了一个新颖的框架,利用千人基因组计划第三阶段26个群体的参考面板来选择标签单核苷酸多态性(tag SNPs)。我们使用留一法内部验证和标准填补方法,而非成对连锁不平衡,来评估标签SNP在未分型位点的填补性能——平均填补r值。超越成对指标使我们能够考虑全基因组的单倍型多样性,以提高填补准确性,并揭示成对估计中的群体特异性偏差。我们还研究了对比多民族队列和单一种群的阵列设计策略,结果表明,通过优先选择能同时为多个群体提供信息的标签SNP,前者的性能可得到提升。使用我们的框架,我们证明,对于一百万个位点的阵列,罕见变异(频率<1%)的填补准确性提高了0.5 - 3.1%,对于五十万个位点的阵列,根据群体不同,提高了0.7 - 7.1%。最后,我们展示了非非洲人群最近的爆发式增长如何意味着标签SNP平均捕获的其他变异比非洲人群少30%。本文提出的统一框架将使研究人员能够在设计新阵列时做出明智的决策,并有助于推动全球健康领域罕见变异关联研究的下一阶段发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/6169386/14fd93e6e7c4/3255f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验