基于模型的无关个体祖先快速估计

Fast model-based estimation of ancestry in unrelated individuals.

作者信息

Alexander David H, Novembre John, Lange Kenneth

机构信息

Department of Biomathematics, University of California at Los Angeles, Los Angeles, California 90095, USA.

出版信息

Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

DOI:10.1101/gr.094052.109

PMID:19648217

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2752134/

Abstract

Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.

摘要

群体分层长期以来一直被认为是基因关联研究中的一个混杂因素。从多位点基因型数据推导出来的估计祖先成分，可用于对群体分层进行统计校正。一种流行的祖先成分估计技术是广泛应用的程序Structure所体现的基于模型的方法。另一种方法在程序EIGENSTRAT中实现，它依赖于主成分分析而非基于模型的估计，并且不直接给出混合比例。EIGENSTRAT越来越受欢迎，部分原因是与Structure相比它速度极快。我们提出了一种新算法和一个程序ADMIXTURE，用于对无关个体的祖先成分进行基于模型的估计。ADMIXTURE采用了Structure中嵌入的似然模型。然而，ADMIXTURE运行速度要快得多，能在几分钟内解决Structure需要数小时才能解决的问题。在我们的许多实验中，我们发现ADMIXTURE几乎与EIGENSTRAT一样快。ADMIXTURE运行时间的改进依赖于一种快速块松弛方案，该方案使用序列二次规划进行块更新，并结合了一种新颖的拟牛顿收敛加速方法。我们的算法在运行速度上也比程序FRAPPE中纳入的期望最大化（EM）算法的实现更快且更准确。我们的模拟表明，ADMIXTURE对潜在混合系数和祖先等位基因频率的最大似然估计与Structure的贝叶斯估计一样准确。在实际数据集上，ADMIXTURE的估计与来自Structure和EIGENSTRAT的估计直接可比。综合来看，我们的结果表明，ADMIXTURE的计算速度为在基于模型的祖先成分估计中使用大得多的标记集开辟了可能性，并且其估计适用于在关联研究中校正群体分层。

相似文献

Fast model-based estimation of ancestry in unrelated individuals.基于模型的无关个体祖先快速估计

Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

Fast and efficient estimation of individual ancestry coefficients.个体祖先系数的快速高效估计。

Genetics. 2014 Apr;196(4):973-83. doi: 10.1534/genetics.113.160572. Epub 2014 Feb 4.

A fast least-squares algorithm for population inference.一种快速的用于群体推断的最小二乘法。

BMC Bioinformatics. 2013 Jan 23;14:28. doi: 10.1186/1471-2105-14-28.

Enhancements to the ADMIXTURE algorithm for individual ancestry estimation.ADMIXTURE 算法在个体血统估计中的改进。

BMC Bioinformatics. 2011 Jun 18;12:246. doi: 10.1186/1471-2105-12-246.

Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.利用多个群体的等位基因频率从DNA序列数据中快速推断个体祖先。

BMC Bioinformatics. 2015 Jan 16;16:4. doi: 10.1186/s12859-014-0418-7.

FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data.FastPop：一种利用遗传数据推断洲际血统的快速主成分衍生方法。

BMC Bioinformatics. 2016 Mar 9;17:122. doi: 10.1186/s12859-016-0965-1.

A classical likelihood based approach for admixture mapping using EM algorithm.一种基于经典似然性的使用期望最大化（EM）算法进行混合映射的方法。

Hum Genet. 2006 Oct;120(3):431-45. doi: 10.1007/s00439-006-0224-z. Epub 2006 Aug 5.

Complex genetic admixture histories reconstructed with Approximate Bayesian Computation.利用近似贝叶斯计算重建复杂的遗传混合历史。

Mol Ecol Resour. 2021 May;21(4):1098-1117. doi: 10.1111/1755-0998.13325. Epub 2021 Feb 26.

[The use of the expectation-maximization (EM) algorithm for maximum likelihood estimation of gametic frequencies of multilocus polymorphic codominant systems based on sampled population data].[基于抽样群体数据，使用期望最大化（EM）算法对多位点共显性系统的配子频率进行最大似然估计]

Genetika. 2002 Mar;38(3):407-18.

Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.在存在亲缘关系的情况下，对群体结构进行稳健推断，以进行血统预测和分层校正。

Genet Epidemiol. 2015 May;39(4):276-93. doi: 10.1002/gepi.21896. Epub 2015 Mar 23.

引用本文的文献

Whole-Genome Resequencing Provides Novel Insights Into the Genetic Diversity, Population Structure, and Patterns of Runs of Homozygosity in Mud Crab ().全基因组重测序为青蟹的遗传多样性、种群结构和纯合子连续区域模式提供了新见解。

Evol Appl. 2025 Sep 3;18(9):e70153. doi: 10.1111/eva.70153. eCollection 2025 Sep.

Phylogeographic and Genomic Insights Unveil the Evolutionary History and Post-Glacial Recolonization Routes of the Palmate Newt () Into Europe.系统发育地理学和基因组学研究揭示了掌状蝾螈（Palmate Newt）在欧洲的进化历史和冰期后重新定殖路线。

Ecol Evol. 2025 Sep 3;15(9):e71994. doi: 10.1002/ece3.71994. eCollection 2025 Sep.

Analysis of candidate genes identified via genome-wide association analysis of sugar-related traits in maize kernels.通过对玉米粒中与糖分相关性状的全基因组关联分析鉴定出的候选基因分析

Plant Genome. 2025 Sep;18(3):e70101. doi: 10.1002/tpg2.70101.

Ancient genomes provide evidence of demographic shift to Slavic-associated groups in Moravia.古代基因组为摩拉维亚地区向与斯拉夫人相关群体的人口结构转变提供了证据。

Genome Biol. 2025 Sep 3;26(1):259. doi: 10.1186/s13059-025-03700-9.

Ancient DNA connects large-scale migration with the spread of Slavs.古代DNA将大规模迁徙与斯拉夫人的扩张联系起来。

Nature. 2025 Sep 3. doi: 10.1038/s41586-025-09437-6.

Investigating the role of transposable elements in shaping abdominal fat and egg production phenotypic traits in geese.研究转座元件在塑造鹅腹部脂肪和产蛋表型性状中的作用。

BMC Genomics. 2025 Sep 3;26(1):803. doi: 10.1186/s12864-025-11976-1.

Pear scab resistance gene Rvn1 from Ussurian pear is located in a cluster of receptor-like protein ethylene-inducing Xylanase (EIX) genes.来自秋子梨的梨黑星病抗性基因Rvn1位于类受体蛋白乙烯诱导木聚糖酶（EIX）基因簇中。

BMC Plant Biol. 2025 Sep 2;25(1):1191. doi: 10.1186/s12870-025-07209-y.

Gone With the Wind: Exploring a Vanished Rock Dove, , Hybrid Zone in the Sahara Desert.《随风而逝：探寻撒哈拉沙漠中一个消失的原鸽杂交区》

Ecol Evol. 2025 Aug 27;15(9):e72061. doi: 10.1002/ece3.72061. eCollection 2025 Sep.

Genome-wide selection signal analysis reveals the adaptability of Tibetan sheep to high altitudes.全基因组选择信号分析揭示了藏羊对高海拔环境的适应性。

Front Vet Sci. 2025 Aug 14;12:1632017. doi: 10.3389/fvets.2025.1632017. eCollection 2025.

Structural and deleterious burdens and their effects on yield traits in foxtail millet domestication.谷子驯化过程中的结构和有害负担及其对产量性状的影响。

iScience. 2025 Aug 6;28(9):113295. doi: 10.1016/j.isci.2025.113295. eCollection 2025 Sep 19.

本文引用的文献

Genes mirror geography within Europe.基因反映了欧洲内部的地理特征。

Nature. 2008 Nov 6;456(7218):98-101. doi: 10.1038/nature07331. Epub 2008 Aug 31.

Interpreting principal component analyses of spatial population genetic variation.解读空间群体遗传变异的主成分分析

Nat Genet. 2008 May;40(5):646-9. doi: 10.1038/ng.139. Epub 2008 Apr 20.

On the inference of ancestries in admixed populations.关于混合群体中祖先的推断。

Genome Res. 2008 Apr;18(4):668-75. doi: 10.1101/gr.072751.107. Epub 2008 Mar 18.

Worldwide human relationships inferred from genome-wide patterns of variation.从全基因组变异模式推断全球人类关系。

Science. 2008 Feb 22;319(5866):1100-4. doi: 10.1126/science.1153717.

Genotype, haplotype and copy-number variation in worldwide human populations.全球人类群体中的基因型、单倍型和拷贝数变异。

Nature. 2008 Feb 21;451(7181):998-1003. doi: 10.1038/nature06742.

Estimating local ancestry in admixed populations.估计混合群体中的本地祖先。

Am J Hum Genet. 2008 Feb;82(2):290-303. doi: 10.1016/j.ajhg.2007.09.022.

Discerning the ancestry of European Americans in genetic association studies.在基因关联研究中识别欧裔美国人的血统

PLoS Genet. 2008 Jan;4(1):e236. doi: 10.1371/journal.pgen.0030236. Epub 2007 Nov 19.

Population structure and eigenanalysis.群体结构与特征分析

PLoS Genet. 2006 Dec;2(12):e190. doi: 10.1371/journal.pgen.0020190.

Principal components analysis corrects for stratification in genome-wide association studies.主成分分析可校正全基因组关联研究中的分层现象。

Nat Genet. 2006 Aug;38(8):904-9. doi: 10.1038/ng1847. Epub 2006 Jul 23.

Reconstructing genetic ancestry blocks in admixed individuals.重建混合个体中的遗传祖先片段。

Am J Hum Genet. 2006 Jul;79(1):1-12. doi: 10.1086/504302. Epub 2006 May 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于模型的无关个体祖先快速估计

Fast model-based estimation of ancestry in unrelated individuals.

作者信息

Alexander David H, Novembre John, Lange Kenneth

机构信息

Department of Biomathematics, University of California at Los Angeles, Los Angeles, California 90095, USA.

出版信息

Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

DOI:10.1101/gr.094052.109

PMID:19648217

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2752134/

Abstract

摘要

基于模型的无关个体祖先快速估计

Fast model-based estimation of ancestry in unrelated individuals.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

基于模型的无关个体祖先快速估计

Fast model-based estimation of ancestry in unrelated individuals.

作者信息

机构信息

出版信息