Sainudiin Raazesh, York Thomas
Department of Statistics, University of Oxford, Oxford, OX1 3TG, UK.
Algorithms Mol Biol. 2009 Jan 7;4:1. doi: 10.1186/1748-7188-4-1.
In phylogenetic inference one is interested in obtaining samples from the posterior distribution over the tree space on the basis of some observed DNA sequence data. One of the simplest sampling methods is the rejection sampler due to von Neumann. Here we introduce an auto-validating version of the rejection sampler, via interval analysis, to rigorously draw samples from posterior distributions over small phylogenetic tree spaces.
The posterior samples from the auto-validating sampler are used to rigorously (i) estimate posterior probabilities for different rooted topologies based on mitochondrial DNA from human, chimpanzee and gorilla, (ii) conduct a non-parametric test of rate variation between protein-coding and tRNA-coding sites from three primates and (iii) obtain a posterior estimate of the human-neanderthal divergence time.
This solves the open problem of rigorously drawing independent and identically distributed samples from the posterior distribution over rooted and unrooted small tree spaces (3 or 4 taxa) based on any multiply-aligned sequence data.
在系统发育推断中,人们希望基于一些观测到的DNA序列数据,从树空间上的后验分布中获取样本。最简单的采样方法之一是冯·诺依曼提出的拒绝采样器。在此,我们通过区间分析引入一种自动验证版本的拒绝采样器,以便在小型系统发育树空间上严格地从后验分布中抽取样本。
自动验证采样器得到的后验样本被用于严格地(i)基于人类、黑猩猩和大猩猩的线粒体DNA估计不同有根拓扑结构的后验概率,(ii)对三种灵长类动物的蛋白质编码位点和tRNA编码位点之间的速率变化进行非参数检验,以及(iii)获得人类与尼安德特人分歧时间的后验估计。
这解决了一个开放性问题,即基于任何多重比对序列数据,在有根和无根的小型树空间(3个或4个分类单元)上严格地从后验分布中抽取独立同分布样本的问题。