Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, USA.
Mol Biol Evol. 2024 Jan 3;41(1). doi: 10.1093/molbev/msad282.
When the time of an HIV transmission event is unknown, methods to identify it from virus genetic data can reveal the circumstances that enable transmission. We developed a single-parameter Markov model to infer transmission time from an HIV phylogeny constructed of multiple virus sequences from people in a transmission pair. Our method finds the statistical support for transmission occurring in different possible time slices. We compared our time-slice model results to previously described methods: a tree-based logical transmission interval, a simple parsimony-like rules-based method, and a more complex coalescent model. Across simulations with multiple transmitted lineages, different transmission times relative to the source's infection, and different sampling times relative to transmission, we found that overall our time-slice model provided accurate and narrower estimates of the time of transmission. We also identified situations when transmission time or direction was difficult to estimate by any method, particularly when transmission occurred long after the source was infected and when sampling occurred long after transmission. Applying our model to real HIV transmission pairs showed some agreement with facts known from the case investigations. We also found, however, that uncertainty on the inferred transmission time was driven more by uncertainty from time calibration of the phylogeny than from the model inference itself. Encouragingly, comparable performance of the Markov time-slice model and the coalescent model-which make use of different information within a tree-suggests that a new method remains to be described that will make full use of the topology and node times for improved transmission time inference.
当 HIV 传播事件的时间未知时,从病毒遗传数据中识别它的方法可以揭示传播的情况。我们开发了一种单参数马尔可夫模型,从来自传播对中的多个人的病毒序列构建的 HIV 系统发育中推断传播时间。我们的方法找到了在不同可能的时间片中发生传播的统计支持。我们将我们的时间片模型结果与以前描述的方法进行了比较:基于树的逻辑传播间隔、基于简单简约的规则的方法和更复杂的合并模型。在具有多个传播谱系、相对于源感染的不同传播时间以及相对于传播的不同采样时间的模拟中,我们发现总体而言,我们的时间片模型提供了传播时间的准确且更窄的估计。我们还确定了任何方法都难以估计传播时间或方向的情况,特别是当传播发生在源感染很久之后且采样发生在传播很久之后。将我们的模型应用于实际的 HIV 传播对显示出与从案例调查中得知的事实有些吻合。然而,我们还发现,推断出的传播时间的不确定性更多地是由系统发育时间校准的不确定性引起的,而不是由模型推断本身引起的。令人鼓舞的是,马尔可夫时间片模型和合并模型的性能相当-它们利用了树中的不同信息-表明仍需要描述一种新方法,以便充分利用拓扑结构和节点时间来改进传播时间推断。