Poon Art F Y
BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada Department of Medicine, University of British Columbia, Vancouver, BC, Canada Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
Mol Biol Evol. 2015 Sep;32(9):2483-95. doi: 10.1093/molbev/msv123. Epub 2015 May 25.
The shapes of phylogenetic trees relating virus populations are determined by the adaptation of viruses within each host, and by the transmission of viruses among hosts. Phylodynamic inference attempts to reverse this flow of information, estimating parameters of these processes from the shape of a virus phylogeny reconstructed from a sample of genetic sequences from the epidemic. A key challenge to phylodynamic inference is quantifying the similarity between two trees in an efficient and comprehensive way. In this study, I demonstrate that a new distance measure, based on a subset tree kernel function from computational linguistics, confers a significant improvement over previous measures of tree shape for classifying trees generated under different epidemiological scenarios. Next, I incorporate this kernel-based distance measure into an approximate Bayesian computation (ABC) framework for phylodynamic inference. ABC bypasses the need for an analytical solution of model likelihood, as it only requires the ability to simulate data from the model. I validate this "kernel-ABC" method for phylodynamic inference by estimating parameters from data simulated under a simple epidemiological model. Results indicate that kernel-ABC attained greater accuracy for parameters associated with virus transmission than leading software on the same data sets. Finally, I apply the kernel-ABC framework to study a recent outbreak of a recombinant HIV subtype in China. Kernel-ABC provides a versatile framework for phylodynamic inference because it can fit a broader range of models than methods that rely on the computation of exact likelihoods.
与病毒群体相关的系统发育树的形状,由每种宿主内病毒的适应性以及病毒在宿主间的传播所决定。系统发育动力学推断试图逆转这种信息流,从根据疫情中的基因序列样本重建的病毒系统发育形状来估计这些过程的参数。系统发育动力学推断的一个关键挑战是以高效且全面的方式量化两棵树之间的相似性。在本研究中,我证明了一种基于计算语言学中的子集树核函数的新距离度量,相较于先前用于对在不同流行病学场景下生成的树进行分类的树形度量有显著改进。接下来,我将这种基于核的距离度量纳入用于系统发育动力学推断的近似贝叶斯计算(ABC)框架。ABC绕过了对模型似然性解析解的需求,因为它只需要具备从模型模拟数据的能力。我通过从在简单流行病学模型下模拟的数据估计参数,验证了这种用于系统发育动力学推断的“核ABC”方法。结果表明,在相同数据集上,核ABC在与病毒传播相关的参数估计方面比领先软件具有更高的准确性。最后,我应用核ABC框架来研究中国近期爆发的一种重组HIV亚型。核ABC为系统发育动力学推断提供了一个通用框架,因为与依赖精确似然性计算的方法相比,它可以拟合更广泛的模型。