Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, VIC, 3020, Australia.
School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, 2006, Australia.
BMC Evol Biol. 2018 Jun 19;18(1):95. doi: 10.1186/s12862-018-1210-5.
Recent developments in sequencing technologies make it possible to obtain genome sequences from a large number of isolates in a very short time. Bayesian phylogenetic approaches can take advantage of these data by simultaneously inferring the phylogenetic tree, evolutionary timescale, and demographic parameters (such as population growth rates), while naturally integrating uncertainty in all parameters. Despite their desirable properties, Bayesian approaches can be computationally intensive, hindering their use for outbreak investigations involving genome data for a large numbers of pathogen isolates. An alternative to using full Bayesian inference is to use a hybrid approach, where the phylogenetic tree and evolutionary timescale are estimated first using maximum likelihood. Under this hybrid approach, demographic parameters are inferred from estimated trees instead of the sequence data, using maximum likelihood, Bayesian inference, or approximate Bayesian computation. This can vastly reduce the computational burden, but has the disadvantage of ignoring the uncertainty in the phylogenetic tree and evolutionary timescale.
We compared the performance of a fully Bayesian and a hybrid method by analysing six whole-genome SNP data sets from a range of bacteria and simulations. The estimates from the two methods were very similar, suggesting that the hybrid method is a valid alternative for very large datasets. However, we also found that congruence between these methods is contingent on the presence of strong temporal structure in the data (i.e. clocklike behaviour), which is typically verified using a date-randomisation test in a Bayesian framework. To reduce the computational burden of this Bayesian test we implemented a date-randomisation test using a rapid maximum likelihood method, which has similar performance to its Bayesian counterpart.
Hybrid approaches can produce reliable inferences of evolutionary timescales and phylodynamic parameters in a fraction of the time required for fully Bayesian analyses. As such, they are a valuable alternative in outbreak studies involving a large number of isolates.
测序技术的最新进展使得在极短的时间内从大量分离株中获得基因组序列成为可能。贝叶斯系统发育方法可以利用这些数据,同时推断系统发育树、进化时间尺度和人口统计学参数(如人口增长率),同时自然地整合所有参数的不确定性。尽管它们具有理想的特性,但贝叶斯方法可能计算量很大,阻碍了它们在涉及大量病原体分离株基因组数据的暴发调查中的应用。替代使用全贝叶斯推断的方法是使用混合方法,其中使用最大似然法首先估计系统发育树和进化时间尺度。在这种混合方法下,使用最大似然法、贝叶斯推断或近似贝叶斯计算,从估计的树上推断人口统计学参数,而不是从序列数据中推断。这可以大大降低计算负担,但缺点是忽略了系统发育树和进化时间尺度的不确定性。
我们通过分析来自一系列细菌的六个全基因组 SNP 数据集和模拟数据,比较了完全贝叶斯和混合方法的性能。两种方法的估计值非常相似,表明混合方法是非常大的数据集的有效替代方法。然而,我们还发现,这两种方法之间的一致性取决于数据中是否存在强时间结构(即时钟行为),这通常在贝叶斯框架中使用日期随机化检验来验证。为了降低贝叶斯检验的计算负担,我们实现了一种使用快速最大似然法的日期随机化检验,其性能与贝叶斯检验相当。
混合方法可以在全贝叶斯分析所需时间的一小部分内可靠地推断进化时间尺度和系统发育参数。因此,在涉及大量分离株的暴发研究中,它们是一种有价值的替代方法。