Suppr超能文献

一种用于系统发育树上变化点的模拟方法。

A simulation approach for change-points on phylogenetic trees.

作者信息

Persing Adam, Jasra Ajay, Beskos Alexandros, Balding David, De Iorio Maria

机构信息

1 Department of Statistical Science, University College London , London, United Kingdom .

出版信息

J Comput Biol. 2015 Jan;22(1):10-24. doi: 10.1089/cmb.2014.0218.

Abstract

We observe n sequences at each of m sites and assume that they have evolved from an ancestral sequence that forms the root of a binary tree of known topology and branch lengths, but the sequence states at internal nodes are unknown. The topology of the tree and branch lengths are the same for all sites, but the parameters of the evolutionary model can vary over sites. We assume a piecewise constant model for these parameters, with an unknown number of change-points and hence a transdimensional parameter space over which we seek to perform Bayesian inference. We propose two novel ideas to deal with the computational challenges of such inference. Firstly, we approximate the model based on the time machine principle: the top nodes of the binary tree (near the root) are replaced by an approximation of the true distribution; as more nodes are removed from the top of the tree, the cost of computing the likelihood is reduced linearly in n. The approach introduces a bias, which we investigate empirically. Secondly, we develop a particle marginal Metropolis-Hastings (PMMH) algorithm, that employs a sequential Monte Carlo (SMC) sampler and can use the first idea. Our time-machine PMMH algorithm copes well with one of the bottle-necks of standard computational algorithms: the transdimensional nature of the posterior distribution. The algorithm is implemented on simulated and real data examples, and we empirically demonstrate its potential to outperform competing methods based on approximate Bayesian computation (ABC) techniques.

摘要

我们在m个位点中的每一个位点观察n个序列,并假设它们是从一个祖先序列进化而来的,该祖先序列构成了一个已知拓扑结构和分支长度的二叉树的根,但内部节点的序列状态是未知的。所有位点的树的拓扑结构和分支长度都是相同的,但进化模型的参数可以在位点间变化。我们假设这些参数采用分段常数模型,具有未知数量的变化点,因此有一个跨维度参数空间,我们试图在其上进行贝叶斯推断。我们提出了两个新颖的想法来应对这种推断的计算挑战。首先,我们基于时间机器原理对模型进行近似:二叉树的顶部节点(靠近根)被真实分布的近似所取代;随着更多节点从树顶被移除,计算似然的成本在n中呈线性降低。该方法引入了偏差,我们通过实证进行了研究。其次,我们开发了一种粒子边际Metropolis-Hastings(PMMH)算法,它采用顺序蒙特卡罗(SMC)采样器并且可以使用第一个想法。我们的时间机器PMMH算法很好地应对了标准计算算法的瓶颈之一:后验分布的跨维度性质。该算法在模拟和真实数据示例上实现,并且我们通过实证证明了它优于基于近似贝叶斯计算(ABC)技术的竞争方法的潜力。

相似文献

1
A simulation approach for change-points on phylogenetic trees.
J Comput Biol. 2015 Jan;22(1):10-24. doi: 10.1089/cmb.2014.0218.
2
Bayesian coestimation of phylogeny and sequence alignment.
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
3
Bayesian phylogeny analysis via stochastic approximation Monte Carlo.
Mol Phylogenet Evol. 2009 Nov;53(2):394-403. doi: 10.1016/j.ympev.2009.06.019. Epub 2009 Jul 7.
4
Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo.
Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3955-64. doi: 10.1098/rstb.2008.0178.
5
Bayesian inference for duplication-mutation with complementarity network models.
J Comput Biol. 2015 Nov;22(11):1025-33. doi: 10.1089/cmb.2015.0072. Epub 2015 Sep 10.
6
An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics.
Syst Biol. 2020 Jan 1;69(1):155-183. doi: 10.1093/sysbio/syz028.
9
Estimating species trees using approximate Bayesian computation.
Mol Phylogenet Evol. 2011 May;59(2):354-63. doi: 10.1016/j.ympev.2011.02.019. Epub 2011 Mar 21.
10
Reconstruction of ancestral genomic sequences using likelihood.
J Comput Biol. 2007 Mar;14(2):216-37. doi: 10.1089/cmb.2006.0101.

引用本文的文献

1
Bayesian estimation of scaled mutation rate under the coalescent: a sequential Monte Carlo approach.
BMC Bioinformatics. 2017 Dec 8;18(1):541. doi: 10.1186/s12859-017-1948-6.

本文引用的文献

1
Bayesian selection of nucleotide substitution models and their site assignments.
Mol Biol Evol. 2013 Mar;30(3):669-88. doi: 10.1093/molbev/mss258. Epub 2012 Dec 11.
2
YeastIP: a database for identification and phylogeny of Saccharomycotina yeasts.
FEMS Yeast Res. 2013 Feb;13(1):117-25. doi: 10.1111/1567-1364.12017. Epub 2012 Dec 17.
3
Phylogenetic inference via sequential Monte Carlo.
Syst Biol. 2012 Jul;61(4):579-93. doi: 10.1093/sysbio/syr131. Epub 2012 Jan 4.
4
Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems.
J R Soc Interface. 2009 Feb 6;6(31):187-202. doi: 10.1098/rsif.2008.0172.
5
A nonparametric method for accommodating and testing across-site rate variation.
Syst Biol. 2007 Dec;56(6):975-87. doi: 10.1080/10635150701670569.
6
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.
Genome Res. 2005 Aug;15(8):1034-50. doi: 10.1101/gr.3715005. Epub 2005 Jul 15.
8
Genome sequence of the Brown Norway rat yields insights into mammalian evolution.
Nature. 2004 Apr 1;428(6982):493-521. doi: 10.1038/nature02426.
9
Phylogenetic shadowing of primate sequences to find functional regions of the human genome.
Science. 2003 Feb 28;299(5611):1391-4. doi: 10.1126/science.1081331.
10
Initial sequencing and comparative analysis of the mouse genome.
Nature. 2002 Dec 5;420(6915):520-62. doi: 10.1038/nature01262.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验