通过连锁不平衡校正增强遗传样本的定位。

Enhanced localization of genetic samples through linkage-disequilibrium correction.

机构信息

The Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel.

出版信息

Am J Hum Genet. 2013 Jun 6;92(6):882-94. doi: 10.1016/j.ajhg.2013.04.023. Epub 2013 May 30.

DOI:10.1016/j.ajhg.2013.04.023

PMID:23726367

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3675263/

Abstract

Characterizing the spatial patterns of genetic diversity in human populations has a wide range of applications, from detecting genetic mutations associated with disease to inferring human history. Current approaches, including the widely used principal-component analysis, are not suited for the analysis of linked markers, and local and long-range linkage disequilibrium (LD) can dramatically reduce the accuracy of spatial localization when unaccounted for. To overcome this, we have introduced an approach that performs spatial localization of individuals on the basis of their genetic data and explicitly models LD among markers by using a multivariate normal distribution. By leveraging external reference panels, we derive closed-form solutions to the optimization procedure to achieve a computationally efficient method that can handle large data sets. We validate the method on empirical data from a large sample of European individuals from the POPRES data set, as well as on a large sample of individuals of Spanish ancestry. First, we show that by modeling LD, we achieve accuracy superior to that of existing methods. Importantly, whereas other methods show decreased performance when dense marker panels are used in the inference, our approach improves in accuracy as more markers become available. Second, we show that accurate localization of genetic data can be achieved with only a part of the genome, and this could potentially enable the spatial localization of admixed samples that have a fraction of their genome originating from a given continent. Finally, we demonstrate that our approach is resistant to distortions resulting from long-range LD regions; such distortions can dramatically bias the results when unaccounted for.

摘要

描述人类群体遗传多样性的空间模式具有广泛的应用，从检测与疾病相关的遗传突变到推断人类历史。目前的方法，包括广泛使用的主成分分析，并不适合分析连锁标记，局部和长程连锁不平衡（LD）在未被考虑时会极大地降低空间定位的准确性。为了克服这一问题，我们引入了一种方法，该方法基于个体的遗传数据对其进行空间定位，并通过使用多元正态分布来明确地对标记之间的 LD 进行建模。通过利用外部参考面板，我们为优化过程推导出了闭式解，以实现一种计算效率高的方法，能够处理大数据集。我们在来自 POPRES 数据集的大量欧洲个体的实证数据以及大量西班牙裔个体的实证数据上验证了该方法。首先，我们表明，通过对 LD 进行建模，我们实现了优于现有方法的准确性。重要的是，虽然其他方法在推断中使用密集标记面板时性能下降，但我们的方法随着更多标记的可用而准确性提高。其次，我们表明，仅使用基因组的一部分就可以实现遗传数据的准确定位，这可能使来自特定大陆的一部分基因组的混合样本的空间定位成为可能。最后，我们证明了我们的方法对长程 LD 区域导致的扭曲具有抵抗力；如果不考虑这些扭曲，它们会极大地影响结果。

相似文献

Enhanced localization of genetic samples through linkage-disequilibrium correction.通过连锁不平衡校正增强遗传样本的定位。

Am J Hum Genet. 2013 Jun 6;92(6):882-94. doi: 10.1016/j.ajhg.2013.04.023. Epub 2013 May 30.

ALDsuite: Dense marker MALD using principal components of ancestral linkage disequilibrium.ALDsuite：利用祖先连锁不平衡的主成分进行密集标记MALD

BMC Genet. 2015 Mar 7;16:23. doi: 10.1186/s12863-015-0179-y.

Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies.用于全基因组关联研究分层校正的空间遗传血统新型概率模型。

Bioinformatics. 2017 Mar 15;33(6):879-885. doi: 10.1093/bioinformatics/btw720.

HaploPOP: a software that improves population assignment by combining markers into haplotypes.HaploPOP：一种通过将标记组合成单倍型来改进群体分配的软件。

BMC Bioinformatics. 2015 Jul 31;16:242. doi: 10.1186/s12859-015-0661-6.

Genome screens using linkage disequilibrium tests: optimal marker characteristics and feasibility.使用连锁不平衡检验的基因组筛查：最佳标记特征与可行性

Am J Hum Genet. 1998 Dec;63(6):1872-85. doi: 10.1086/302139.

Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.利用多个群体的等位基因频率从DNA序列数据中快速推断个体祖先。

BMC Bioinformatics. 2015 Jan 16;16:4. doi: 10.1186/s12859-014-0418-7.

Handling linkage disequilibrium in qualitative trait linkage analysis using dense SNPs: a two-step strategy.在使用高密度单核苷酸多态性（SNP）进行定性性状连锁分析时处理连锁不平衡：一种两步策略。

BMC Genet. 2009 Aug 10;10:44. doi: 10.1186/1471-2156-10-44.

The expected power of genome-wide linkage disequilibrium testing using single nucleotide polymorphism markers for detecting a low-frequency disease variant.使用单核苷酸多态性标记进行全基因组连锁不平衡检测以发现低频疾病变异的预期效能。

Ann Hum Genet. 2002 Jul;66(Pt 4):297-306. doi: 10.1017/S0003480002001197.

Linkage disequilibrium and inference of ancestral recombination in 538 single-nucleotide polymorphism clusters across the human genome.人类基因组中538个单核苷酸多态性簇的连锁不平衡与祖先重组推断

Am J Hum Genet. 2003 Aug;73(2):285-300. doi: 10.1086/377138. Epub 2003 Jul 3.

Performance of a blockwise approach in variable selection using linkage disequilibrium information.使用连锁不平衡信息进行变量选择时的分块方法性能。

BMC Bioinformatics. 2015 May 8;16:148. doi: 10.1186/s12859-015-0556-6.

引用本文的文献

Genomic divergence landscape in recurrently hybridizing sister taxa suggests stable steady state between mutual gene flow and isolation.反复杂交的姐妹分类群中的基因组分化格局表明，相互基因流与隔离之间存在稳定的稳态。

Evol Lett. 2020 Nov 6;5(1):86-100. doi: 10.1002/evl3.204. eCollection 2021 Feb.

Predicting geographic location from genetic variation with deep neural networks.利用深度神经网络从遗传变异中预测地理位置。

Elife. 2020 Jun 8;9:e54507. doi: 10.7554/eLife.54507.

Scalable probabilistic PCA for large-scale genetic variation data.可扩展概率主成分分析在大规模遗传变异数据中的应用。

PLoS Genet. 2020 May 29;16(5):e1008773. doi: 10.1371/journal.pgen.1008773. eCollection 2020 May.

Spatially explicit analysis reveals complex human genetic gradients in the Iberian Peninsula.空间显式分析揭示了伊比利亚半岛人类复杂的遗传梯度。

Sci Rep. 2019 May 24;9(1):7825. doi: 10.1038/s41598-019-44121-6.

Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula.伊比利亚半岛的遗传分化模式和历史迁徙的足迹。

Nat Commun. 2019 Feb 1;10(1):551. doi: 10.1038/s41467-018-08272-w.

Apolipoprotein L1 risk variants associate with prevalent atherosclerotic disease in African American systemic lupus erythematosus patients.载脂蛋白L1风险变异与非裔美国系统性红斑狼疮患者中普遍存在的动脉粥样硬化疾病相关。

PLoS One. 2017 Aug 29;12(8):e0182483. doi: 10.1371/journal.pone.0182483. eCollection 2017.

Population Structure of UK Biobank and Ancient Eurasians Reveals Adaptation at Genes Influencing Blood Pressure.英国生物银行与古代欧亚人群的人口结构揭示了影响血压基因的适应性变化。

Am J Hum Genet. 2016 Nov 3;99(5):1130-1139. doi: 10.1016/j.ajhg.2016.09.014. Epub 2016 Oct 20.

The contribution of rare variation to prostate cancer heritability.罕见变异对前列腺癌遗传度的贡献。

Nat Genet. 2016 Jan;48(1):30-5. doi: 10.1038/ng.3446. Epub 2015 Nov 16.

HaploPOP: a software that improves population assignment by combining markers into haplotypes.HaploPOP：一种通过将标记组合成单倍型来改进群体分配的软件。

BMC Bioinformatics. 2015 Jul 31;16:242. doi: 10.1186/s12859-015-0661-6.

Detecting individual ancestry in the human genome.检测人类基因组中的个体血统。

Investig Genet. 2015 May 1;6:7. doi: 10.1186/s13323-015-0019-x. eCollection 2015.

本文引用的文献

Mixed models can correct for population structure for genomic regions under selection.混合模型可以校正选择作用下基因组区域的群体结构。

Nat Rev Genet. 2013 Apr;14(4):300. doi: 10.1038/nrg2813-c1. Epub 2013 Feb 26.

PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations.PCAdmix：基于主成分分析，对来自两个或更多群体的混合血统个体的每条染色体进行祖先归属分析。

Hum Biol. 2012 Aug;84(4):343-64. doi: 10.3378/027.084.0401.

Multiway admixture deconvolution using phased or unphased ancestral panels.使用相或非相祖先面板进行多向混合物反卷积。

Genet Epidemiol. 2013 Jan;37(1):1-12. doi: 10.1002/gepi.21692. Epub 2012 Nov 7.

Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold.使用下一代测序reads 和单倍型支架进行基因型调用和相位分析。

Bioinformatics. 2013 Jan 1;29(1):84-91. doi: 10.1093/bioinformatics/bts632. Epub 2012 Oct 23.

A quantitative comparison of the similarity between genes and geography in worldwide human populations.在全球人类群体中，对基因和地理之间的相似性进行定量比较。

PLoS Genet. 2012 Aug;8(8):e1002886. doi: 10.1371/journal.pgen.1002886. Epub 2012 Aug 23.

A model-based approach for analysis of spatial structure in genetic data.基于模型的方法分析遗传数据中的空间结构。

Nat Genet. 2012 May 20;44(6):725-31. doi: 10.1038/ng.2285.

Patterns of ancestry, signatures of natural selection, and genetic association with stature in Western African pygmies.西非俾格米人的祖先模式、自然选择特征以及与身高的遗传关联。

PLoS Genet. 2012;8(4):e1002641. doi: 10.1371/journal.pgen.1002641. Epub 2012 Apr 26.

Fast and accurate inference of local ancestry in Latino populations.快速准确推断拉丁裔人群的局部血统。

Bioinformatics. 2012 May 15;28(10):1359-67. doi: 10.1093/bioinformatics/bts144. Epub 2012 Apr 11.

Ancestral components of admixed genomes in a Mexican cohort.墨西哥队列中混合基因组的祖先成分。

PLoS Genet. 2011 Dec;7(12):e1002410. doi: 10.1371/journal.pgen.1002410. Epub 2011 Dec 15.

Recombination rates in admixed individuals identified by ancestry-based inference.基于血统推断鉴定的混合个体的重组率。

Nat Genet. 2011 Jul 20;43(9):847-53. doi: 10.1038/ng.894.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验