Suppr超能文献

通过连锁不平衡校正增强遗传样本的定位。

Enhanced localization of genetic samples through linkage-disequilibrium correction.

机构信息

The Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel.

出版信息

Am J Hum Genet. 2013 Jun 6;92(6):882-94. doi: 10.1016/j.ajhg.2013.04.023. Epub 2013 May 30.

Abstract

Characterizing the spatial patterns of genetic diversity in human populations has a wide range of applications, from detecting genetic mutations associated with disease to inferring human history. Current approaches, including the widely used principal-component analysis, are not suited for the analysis of linked markers, and local and long-range linkage disequilibrium (LD) can dramatically reduce the accuracy of spatial localization when unaccounted for. To overcome this, we have introduced an approach that performs spatial localization of individuals on the basis of their genetic data and explicitly models LD among markers by using a multivariate normal distribution. By leveraging external reference panels, we derive closed-form solutions to the optimization procedure to achieve a computationally efficient method that can handle large data sets. We validate the method on empirical data from a large sample of European individuals from the POPRES data set, as well as on a large sample of individuals of Spanish ancestry. First, we show that by modeling LD, we achieve accuracy superior to that of existing methods. Importantly, whereas other methods show decreased performance when dense marker panels are used in the inference, our approach improves in accuracy as more markers become available. Second, we show that accurate localization of genetic data can be achieved with only a part of the genome, and this could potentially enable the spatial localization of admixed samples that have a fraction of their genome originating from a given continent. Finally, we demonstrate that our approach is resistant to distortions resulting from long-range LD regions; such distortions can dramatically bias the results when unaccounted for.

摘要

描述人类群体遗传多样性的空间模式具有广泛的应用,从检测与疾病相关的遗传突变到推断人类历史。目前的方法,包括广泛使用的主成分分析,并不适合分析连锁标记,局部和长程连锁不平衡(LD)在未被考虑时会极大地降低空间定位的准确性。为了克服这一问题,我们引入了一种方法,该方法基于个体的遗传数据对其进行空间定位,并通过使用多元正态分布来明确地对标记之间的 LD 进行建模。通过利用外部参考面板,我们为优化过程推导出了闭式解,以实现一种计算效率高的方法,能够处理大数据集。我们在来自 POPRES 数据集的大量欧洲个体的实证数据以及大量西班牙裔个体的实证数据上验证了该方法。首先,我们表明,通过对 LD 进行建模,我们实现了优于现有方法的准确性。重要的是,虽然其他方法在推断中使用密集标记面板时性能下降,但我们的方法随着更多标记的可用而准确性提高。其次,我们表明,仅使用基因组的一部分就可以实现遗传数据的准确定位,这可能使来自特定大陆的一部分基因组的混合样本的空间定位成为可能。最后,我们证明了我们的方法对长程 LD 区域导致的扭曲具有抵抗力;如果不考虑这些扭曲,它们会极大地影响结果。

相似文献

引用本文的文献

3
Scalable probabilistic PCA for large-scale genetic variation data.可扩展概率主成分分析在大规模遗传变异数据中的应用。
PLoS Genet. 2020 May 29;16(5):e1008773. doi: 10.1371/journal.pgen.1008773. eCollection 2020 May.
10
Detecting individual ancestry in the human genome.检测人类基因组中的个体血统。
Investig Genet. 2015 May 1;6:7. doi: 10.1186/s13323-015-0019-x. eCollection 2015.

本文引用的文献

8
Fast and accurate inference of local ancestry in Latino populations.快速准确推断拉丁裔人群的局部血统。
Bioinformatics. 2012 May 15;28(10):1359-67. doi: 10.1093/bioinformatics/bts144. Epub 2012 Apr 11.
9
Ancestral components of admixed genomes in a Mexican cohort.墨西哥队列中混合基因组的祖先成分。
PLoS Genet. 2011 Dec;7(12):e1002410. doi: 10.1371/journal.pgen.1002410. Epub 2011 Dec 15.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验