Suppr超能文献

通过连锁不平衡校正增强遗传样本的定位。

Enhanced localization of genetic samples through linkage-disequilibrium correction.

机构信息

The Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel.

出版信息

Am J Hum Genet. 2013 Jun 6;92(6):882-94. doi: 10.1016/j.ajhg.2013.04.023. Epub 2013 May 30.

Abstract

Characterizing the spatial patterns of genetic diversity in human populations has a wide range of applications, from detecting genetic mutations associated with disease to inferring human history. Current approaches, including the widely used principal-component analysis, are not suited for the analysis of linked markers, and local and long-range linkage disequilibrium (LD) can dramatically reduce the accuracy of spatial localization when unaccounted for. To overcome this, we have introduced an approach that performs spatial localization of individuals on the basis of their genetic data and explicitly models LD among markers by using a multivariate normal distribution. By leveraging external reference panels, we derive closed-form solutions to the optimization procedure to achieve a computationally efficient method that can handle large data sets. We validate the method on empirical data from a large sample of European individuals from the POPRES data set, as well as on a large sample of individuals of Spanish ancestry. First, we show that by modeling LD, we achieve accuracy superior to that of existing methods. Importantly, whereas other methods show decreased performance when dense marker panels are used in the inference, our approach improves in accuracy as more markers become available. Second, we show that accurate localization of genetic data can be achieved with only a part of the genome, and this could potentially enable the spatial localization of admixed samples that have a fraction of their genome originating from a given continent. Finally, we demonstrate that our approach is resistant to distortions resulting from long-range LD regions; such distortions can dramatically bias the results when unaccounted for.

摘要

描述人类群体遗传多样性的空间模式具有广泛的应用,从检测与疾病相关的遗传突变到推断人类历史。目前的方法,包括广泛使用的主成分分析,并不适合分析连锁标记,局部和长程连锁不平衡(LD)在未被考虑时会极大地降低空间定位的准确性。为了克服这一问题,我们引入了一种方法,该方法基于个体的遗传数据对其进行空间定位,并通过使用多元正态分布来明确地对标记之间的 LD 进行建模。通过利用外部参考面板,我们为优化过程推导出了闭式解,以实现一种计算效率高的方法,能够处理大数据集。我们在来自 POPRES 数据集的大量欧洲个体的实证数据以及大量西班牙裔个体的实证数据上验证了该方法。首先,我们表明,通过对 LD 进行建模,我们实现了优于现有方法的准确性。重要的是,虽然其他方法在推断中使用密集标记面板时性能下降,但我们的方法随着更多标记的可用而准确性提高。其次,我们表明,仅使用基因组的一部分就可以实现遗传数据的准确定位,这可能使来自特定大陆的一部分基因组的混合样本的空间定位成为可能。最后,我们证明了我们的方法对长程 LD 区域导致的扭曲具有抵抗力;如果不考虑这些扭曲,它们会极大地影响结果。

相似文献

1
Enhanced localization of genetic samples through linkage-disequilibrium correction.
Am J Hum Genet. 2013 Jun 6;92(6):882-94. doi: 10.1016/j.ajhg.2013.04.023. Epub 2013 May 30.
2
4
HaploPOP: a software that improves population assignment by combining markers into haplotypes.
BMC Bioinformatics. 2015 Jul 31;16:242. doi: 10.1186/s12859-015-0661-6.
10
Performance of a blockwise approach in variable selection using linkage disequilibrium information.
BMC Bioinformatics. 2015 May 8;16:148. doi: 10.1186/s12859-015-0556-6.

引用本文的文献

2
3
Scalable probabilistic PCA for large-scale genetic variation data.
PLoS Genet. 2020 May 29;16(5):e1008773. doi: 10.1371/journal.pgen.1008773. eCollection 2020 May.
4
Spatially explicit analysis reveals complex human genetic gradients in the Iberian Peninsula.
Sci Rep. 2019 May 24;9(1):7825. doi: 10.1038/s41598-019-44121-6.
7
Population Structure of UK Biobank and Ancient Eurasians Reveals Adaptation at Genes Influencing Blood Pressure.
Am J Hum Genet. 2016 Nov 3;99(5):1130-1139. doi: 10.1016/j.ajhg.2016.09.014. Epub 2016 Oct 20.
8
The contribution of rare variation to prostate cancer heritability.
Nat Genet. 2016 Jan;48(1):30-5. doi: 10.1038/ng.3446. Epub 2015 Nov 16.
9
HaploPOP: a software that improves population assignment by combining markers into haplotypes.
BMC Bioinformatics. 2015 Jul 31;16:242. doi: 10.1186/s12859-015-0661-6.
10
Detecting individual ancestry in the human genome.
Investig Genet. 2015 May 1;6:7. doi: 10.1186/s13323-015-0019-x. eCollection 2015.

本文引用的文献

1
Mixed models can correct for population structure for genomic regions under selection.
Nat Rev Genet. 2013 Apr;14(4):300. doi: 10.1038/nrg2813-c1. Epub 2013 Feb 26.
3
Multiway admixture deconvolution using phased or unphased ancestral panels.
Genet Epidemiol. 2013 Jan;37(1):1-12. doi: 10.1002/gepi.21692. Epub 2012 Nov 7.
4
Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold.
Bioinformatics. 2013 Jan 1;29(1):84-91. doi: 10.1093/bioinformatics/bts632. Epub 2012 Oct 23.
5
A quantitative comparison of the similarity between genes and geography in worldwide human populations.
PLoS Genet. 2012 Aug;8(8):e1002886. doi: 10.1371/journal.pgen.1002886. Epub 2012 Aug 23.
6
A model-based approach for analysis of spatial structure in genetic data.
Nat Genet. 2012 May 20;44(6):725-31. doi: 10.1038/ng.2285.
7
Patterns of ancestry, signatures of natural selection, and genetic association with stature in Western African pygmies.
PLoS Genet. 2012;8(4):e1002641. doi: 10.1371/journal.pgen.1002641. Epub 2012 Apr 26.
8
Fast and accurate inference of local ancestry in Latino populations.
Bioinformatics. 2012 May 15;28(10):1359-67. doi: 10.1093/bioinformatics/bts144. Epub 2012 Apr 11.
9
Ancestral components of admixed genomes in a Mexican cohort.
PLoS Genet. 2011 Dec;7(12):e1002410. doi: 10.1371/journal.pgen.1002410. Epub 2011 Dec 15.
10
Recombination rates in admixed individuals identified by ancestry-based inference.
Nat Genet. 2011 Jul 20;43(9):847-53. doi: 10.1038/ng.894.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验