Chifman Julia, Kubatko Laura
Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC 27157, Department of Statistics, The Ohio State University, Columbus, OH 43210 and Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA.
Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC 27157, Department of Statistics, The Ohio State University, Columbus, OH 43210 and Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC 27157, Department of Statistics, The Ohio State University, Columbus, OH 43210 and Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA.
Bioinformatics. 2014 Dec 1;30(23):3317-24. doi: 10.1093/bioinformatics/btu530. Epub 2014 Aug 7.
Increasing attention has been devoted to estimation of species-level phylogenetic relationships under the coalescent model. However, existing methods either use summary statistics (gene trees) to carry out estimation, ignoring an important source of variability in the estimates, or involve computationally intensive Bayesian Markov chain Monte Carlo algorithms that do not scale well to whole-genome datasets.
We develop a method to infer relationships among quartets of taxa under the coalescent model using techniques from algebraic statistics. Uncertainty in the estimated relationships is quantified using the nonparametric bootstrap. The performance of our method is assessed with simulated data. We then describe how our method could be used for species tree inference in larger taxon samples, and demonstrate its utility using datasets for Sistrurus rattlesnakes and for soybeans.
The method to infer the phylogenetic relationship among quartets is implemented in the software SVDquartets, available at www.stat.osu.edu/∼lkubatko/software/SVDquartets.
在溯祖模型下,对物种水平系统发育关系估计的关注日益增加。然而,现有方法要么使用汇总统计量(基因树)进行估计,忽略了估计中一个重要的变异性来源,要么涉及计算密集型的贝叶斯马尔可夫链蒙特卡罗算法,这些算法对于全基因组数据集的扩展性不佳。
我们开发了一种方法,利用代数统计技术在溯祖模型下推断分类群四重奏之间的关系。使用非参数自助法对估计关系中的不确定性进行量化。我们的方法性能通过模拟数据进行评估。然后我们描述了如何将我们的方法用于更大分类群样本中的物种树推断,并使用拟蚺属响尾蛇和大豆的数据集证明了其效用。
推断四重奏之间系统发育关系的方法在软件SVDquartets中实现,可从www.stat.osu.edu/∼lkubatko/software/SVDquartets获取。