fastSTRUCTURE：大型单核苷酸多态性（SNP）数据集中群体结构的变分推断

fastSTRUCTURE: variational inference of population structure in large SNP data sets.

作者信息

Raj Anil, Stephens Matthew, Pritchard Jonathan K

机构信息

Department of Genetics, Stanford University, Stanford, California 94305

Departments of Statistics and Human Genetics, University of Chicago, Chicago, Illinois 60637.

出版信息

Genetics. 2014 Jun;197(2):573-89. doi: 10.1534/genetics.114.164350. Epub 2014 Apr 2.

DOI:10.1534/genetics.114.164350

PMID:24700103

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4063916/

Abstract

Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

摘要

从遗传数据估计种群结构的工具如今在种群遗传学的各种应用中广泛使用。然而，在大型现代数据集中推断种群结构带来了严峻的计算挑战。在此，我们使用变分贝叶斯框架开发了用于近似推断STRUCTURE程序基础模型的高效算法。变分方法将计算相关后验分布的问题转化为一个优化问题，使我们能够基于优化理论的最新进展来开发快速推断工具。此外，我们提出了有用的启发式分数来确定数据集中所代表的种群数量，并提出了一种新的层次先验来检测数据中的弱种群结构。我们在模拟数据上测试了变分算法，并使用来自CEPH - 人类基因组多样性面板的基因型数据进行了说明。变分算法比STRUCTURE快近两个数量级，并且达到了与ADMIXTURE相当的准确性。此外，我们的结果表明，用于选择模型复杂度的启发式分数为数据中所代表的种群数量提供了合理的值范围，在结构非常弱时检测结构的偏差最小。我们的算法fastSTRUCTURE可在http://pritchardlab.stanford.edu/structure.html上免费在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5959/4063916/1a4cc36b6182/573fig1.jpg

相似文献

fastSTRUCTURE: variational inference of population structure in large SNP data sets.fastSTRUCTURE：大型单核苷酸多态性（SNP）数据集中群体结构的变分推断

Genetics. 2014 Jun;197(2):573-89. doi: 10.1534/genetics.114.164350. Epub 2014 Apr 2.

De novo inference of stratification and local admixture in sequencing studies.从头推断测序研究中的分层和局部混合。

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S17. doi: 10.1186/1471-2105-14-S5-S17. Epub 2013 Apr 10.

A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.一种具有基因型×环境相互作用的变分贝叶斯基因组预测模型。

G3 (Bethesda). 2017 Jun 7;7(6):1833-1853. doi: 10.1534/g3.117.041202.

POPSTR: Inference of Admixed Population Structure Based on Single-Nucleotide Polymorphisms and Copy Number Variations.POPSTR：基于单核苷酸多态性和拷贝数变异推断混合群体结构

J Comput Biol. 2018 Apr;25(4):417-429. doi: 10.1089/cmb.2017.0127. Epub 2018 Jan 2.

Inference of Population Structure from Time-Series Genotype Data.基于时间序列基因型数据推断种群结构。

Am J Hum Genet. 2019 Aug 1;105(2):317-333. doi: 10.1016/j.ajhg.2019.06.002. Epub 2019 Jun 27.

A comparison of bayesian methods for haplotype reconstruction from population genotype data.基于群体基因型数据的单倍型重建贝叶斯方法比较。

Am J Hum Genet. 2003 Nov;73(5):1162-9. doi: 10.1086/379378. Epub 2003 Oct 20.

mStruct: inference of population structure in light of both genetic admixing and allele mutations.mStruct：基于遗传混合和等位基因突变推断群体结构。

Genetics. 2009 Jun;182(2):575-93. doi: 10.1534/genetics.108.100222. Epub 2009 Apr 10.

PSMIX: an R package for population structure inference via maximum likelihood method.PSMIX：一个用于通过最大似然法进行群体结构推断的R软件包。

BMC Bioinformatics. 2006 Jun 22;7:317. doi: 10.1186/1471-2105-7-317.

A coalescence-guided hierarchical Bayesian method for haplotype inference.一种用于单倍型推断的合并引导分层贝叶斯方法。

Am J Hum Genet. 2006 Aug;79(2):313-22. doi: 10.1086/506276. Epub 2006 Jun 28.

Stochastic Variational Inference for Bayesian Phylogenetics: A Case of CAT Model.贝叶斯系统发生学的随机变分推断：CAT 模型案例。

Mol Biol Evol. 2019 Apr 1;36(4):825-833. doi: 10.1093/molbev/msz020.

引用本文的文献

One mother for two species via obligate cross-species cloning in ants.蚂蚁通过专性跨物种克隆实现两个物种共享一位蚁后。

Nature. 2025 Sep 3. doi: 10.1038/s41586-025-09425-w.

Classification of Heterotic Groups and Prediction of Heterosis in Sorghum Based on Whole-Genome Resequencing.基于全基因组重测序的高粱杂种优势群分类及杂种优势预测

Int J Mol Sci. 2025 Aug 18;26(16):7950. doi: 10.3390/ijms26167950.

Brain wiring economics, network organisation and population-level genomics.脑连接经济学、网络组织与群体水平基因组学

Imaging Neurosci (Camb). 2025 Jun 4;3. doi: 10.1162/IMAG.a.31. eCollection 2025.

Population structure, gene flow and genetic diversity of sheep blowfly (Lucilia cuprina dorsalis) in Australia.澳大利亚绵羊绿蝇（Lucilia cuprina dorsalis）的种群结构、基因流动与遗传多样性

BMC Genomics. 2025 Aug 12;26(1):743. doi: 10.1186/s12864-025-11852-y.

Molecular profiling and sex determination of germplasm collection: Exploring microsatellite markers and high-resolution melting (HRM) analysis.种质资源收集的分子特征分析与性别鉴定：探索微卫星标记和高分辨率熔解曲线（HRM）分析

PeerJ. 2025 Aug 8;13:e19770. doi: 10.7717/peerj.19770. eCollection 2025.

A reproducible ddRAD-seq protocol reveals novel genomic association signatures for fruit-related traits in peach.一种可重复的ddRAD-seq方案揭示了桃果实相关性状的新基因组关联特征。

Plant Methods. 2025 Jul 22;21(1):101. doi: 10.1186/s13007-025-01415-3.

A high-throughput screening method for selecting feature SNPs to evaluate breed diversity and infer ancestry.一种用于选择特征单核苷酸多态性以评估品种多样性和推断祖先的高通量筛选方法。

Genome Res. 2025 Aug 1;35(8):1875-1886. doi: 10.1101/gr.280176.124.

Genomic analysis of Plasmodium vivax field isolates circulating in sub-Saharan Africa.对在撒哈拉以南非洲地区传播的间日疟原虫野外分离株进行基因组分析。

Commun Biol. 2025 Jul 7;8(1):1012. doi: 10.1038/s42003-025-08276-5.

Whole-genome sequencing of 1,060 Brettanomyces bruxellensis isolates reveals significant phenotypic impact of acquired subgenomes in allopolyploids.对1060株布鲁氏酒香酵母分离株进行全基因组测序发现，异源多倍体中获得的亚基因组具有显著的表型影响。

Nat Commun. 2025 Jul 1;16(1):5500. doi: 10.1038/s41467-025-60706-4.

An ecological, phenotypic, and genomic survey of duckweeds with their associated aquatic environments in the United Kingdom.对英国浮萍及其相关水生环境进行的生态、表型和基因组调查。

AoB Plants. 2025 Mar 31;17(3):plaf018. doi: 10.1093/aobpla/plaf018. eCollection 2025 Jun.

本文引用的文献

The population structure and recent colonization history of Oregon threespine stickleback determined using restriction-site associated DNA-sequencing.利用限制位点相关 DNA 测序确定俄勒冈州三刺鱼的种群结构和近期的殖民历史。

Mol Ecol. 2013 Jun;22(11):2864-83. doi: 10.1111/mec.12330.

Inference of population splits and mixtures from genome-wide allele frequency data.从全基因组等位基因频率数据推断种群分裂和混合。

PLoS Genet. 2012;8(11):e1002967. doi: 10.1371/journal.pgen.1002967. Epub 2012 Nov 15.

Inferring weak population structure with the assistance of sample group information.借助样本群组信息推断较弱的群体结构。

Mol Ecol Resour. 2009 Sep;9(5):1322-32. doi: 10.1111/j.1755-0998.2009.02591.x. Epub 2009 Apr 1.

Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.基于稀疏因子分析的人口结构分析：统一框架与新方法

PLoS Genet. 2010 Sep 16;6(9):e1001117. doi: 10.1371/journal.pgen.1001117.

A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis.一种快速准确的多基因座全基因组关联分析的变分贝叶斯算法。

BMC Bioinformatics. 2010 Jan 27;11:58. doi: 10.1186/1471-2105-11-58.

Reconstructing Indian population history.重构印度人口历史。

Nature. 2009 Sep 24;461(7263):489-94. doi: 10.1038/nature08365.

Fast model-based estimation of ancestry in unrelated individuals.基于模型的无关个体祖先快速估计

Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

Bayesian approach to network modularity.网络模块化的贝叶斯方法。

Phys Rev Lett. 2008 Jun 27;100(25):258701. doi: 10.1103/PhysRevLett.100.258701. Epub 2008 Jun 23.

Interpreting principal component analyses of spatial population genetic variation.解读空间群体遗传变异的主成分分析

Nat Genet. 2008 May;40(5):646-9. doi: 10.1038/ng.139. Epub 2008 Apr 20.

Worldwide human relationships inferred from genome-wide patterns of variation.从全基因组变异模式推断全球人类关系。

Science. 2008 Feb 22;319(5866):1100-4. doi: 10.1126/science.1153717.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

fastSTRUCTURE：大型单核苷酸多态性（SNP）数据集中群体结构的变分推断

fastSTRUCTURE: variational inference of population structure in large SNP data sets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献