Persing Adam, Jasra Ajay, Beskos Alexandros, Balding David, De Iorio Maria
1 Department of Statistical Science, University College London , London, United Kingdom .
J Comput Biol. 2015 Jan;22(1):10-24. doi: 10.1089/cmb.2014.0218.
We observe n sequences at each of m sites and assume that they have evolved from an ancestral sequence that forms the root of a binary tree of known topology and branch lengths, but the sequence states at internal nodes are unknown. The topology of the tree and branch lengths are the same for all sites, but the parameters of the evolutionary model can vary over sites. We assume a piecewise constant model for these parameters, with an unknown number of change-points and hence a transdimensional parameter space over which we seek to perform Bayesian inference. We propose two novel ideas to deal with the computational challenges of such inference. Firstly, we approximate the model based on the time machine principle: the top nodes of the binary tree (near the root) are replaced by an approximation of the true distribution; as more nodes are removed from the top of the tree, the cost of computing the likelihood is reduced linearly in n. The approach introduces a bias, which we investigate empirically. Secondly, we develop a particle marginal Metropolis-Hastings (PMMH) algorithm, that employs a sequential Monte Carlo (SMC) sampler and can use the first idea. Our time-machine PMMH algorithm copes well with one of the bottle-necks of standard computational algorithms: the transdimensional nature of the posterior distribution. The algorithm is implemented on simulated and real data examples, and we empirically demonstrate its potential to outperform competing methods based on approximate Bayesian computation (ABC) techniques.
我们在m个位点中的每一个位点观察n个序列,并假设它们是从一个祖先序列进化而来的,该祖先序列构成了一个已知拓扑结构和分支长度的二叉树的根,但内部节点的序列状态是未知的。所有位点的树的拓扑结构和分支长度都是相同的,但进化模型的参数可以在位点间变化。我们假设这些参数采用分段常数模型,具有未知数量的变化点,因此有一个跨维度参数空间,我们试图在其上进行贝叶斯推断。我们提出了两个新颖的想法来应对这种推断的计算挑战。首先,我们基于时间机器原理对模型进行近似:二叉树的顶部节点(靠近根)被真实分布的近似所取代;随着更多节点从树顶被移除,计算似然的成本在n中呈线性降低。该方法引入了偏差,我们通过实证进行了研究。其次,我们开发了一种粒子边际Metropolis-Hastings(PMMH)算法,它采用顺序蒙特卡罗(SMC)采样器并且可以使用第一个想法。我们的时间机器PMMH算法很好地应对了标准计算算法的瓶颈之一:后验分布的跨维度性质。该算法在模拟和真实数据示例上实现,并且我们通过实证证明了它优于基于近似贝叶斯计算(ABC)技术的竞争方法的潜力。