Suppr超能文献

fastSTRUCTURE:大型单核苷酸多态性(SNP)数据集中群体结构的变分推断

fastSTRUCTURE: variational inference of population structure in large SNP data sets.

作者信息

Raj Anil, Stephens Matthew, Pritchard Jonathan K

机构信息

Department of Genetics, Stanford University, Stanford, California 94305

Departments of Statistics and Human Genetics, University of Chicago, Chicago, Illinois 60637.

出版信息

Genetics. 2014 Jun;197(2):573-89. doi: 10.1534/genetics.114.164350. Epub 2014 Apr 2.

Abstract

Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

摘要

从遗传数据估计种群结构的工具如今在种群遗传学的各种应用中广泛使用。然而,在大型现代数据集中推断种群结构带来了严峻的计算挑战。在此,我们使用变分贝叶斯框架开发了用于近似推断STRUCTURE程序基础模型的高效算法。变分方法将计算相关后验分布的问题转化为一个优化问题,使我们能够基于优化理论的最新进展来开发快速推断工具。此外,我们提出了有用的启发式分数来确定数据集中所代表的种群数量,并提出了一种新的层次先验来检测数据中的弱种群结构。我们在模拟数据上测试了变分算法,并使用来自CEPH - 人类基因组多样性面板的基因型数据进行了说明。变分算法比STRUCTURE快近两个数量级,并且达到了与ADMIXTURE相当的准确性。此外,我们的结果表明,用于选择模型复杂度的启发式分数为数据中所代表的种群数量提供了合理的值范围,在结构非常弱时检测结构的偏差最小。我们的算法fastSTRUCTURE可在http://pritchardlab.stanford.edu/structure.html上免费在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5959/4063916/1a4cc36b6182/573fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验