Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.
Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.
PLoS Comput Biol. 2022 Aug 26;18(8):e1009741. doi: 10.1371/journal.pcbi.1009741. eCollection 2022 Aug.
To identify and stop active HIV transmission chains new epidemiological techniques are needed. Here, we describe the development of a multi-biomarker augmentation to phylogenetic inference of the underlying transmission history in a local population. HIV biomarkers are measurable biological quantities that have some relationship to the amount of time someone has been infected with HIV. To train our model, we used five biomarkers based on real data from serological assays, HIV sequence data, and target cell counts in longitudinally followed, untreated patients with known infection times. The biomarkers were modeled with a mixed effects framework to allow for patient specific variation and general trends, and fit to patient data using Markov Chain Monte Carlo (MCMC) methods. Subsequently, the density of the unobserved infection time conditional on observed biomarkers were obtained by integrating out the random effects from the model fit. This probabilistic information about infection times was incorporated into the likelihood function for the transmission history and phylogenetic tree reconstruction, informed by the HIV sequence data. To critically test our methodology, we developed a coalescent-based simulation framework that generates phylogenies and biomarkers given a specific or general transmission history. Testing on many epidemiological scenarios showed that biomarker augmented phylogenetics can reach 90% accuracy under idealized situations. Under realistic within-host HIV-1 evolution, involving substantial within-host diversification and frequent transmission of multiple lineages, the average accuracy was at about 50% in transmission clusters involving 5-50 hosts. Realistic biomarker data added on average 16 percentage points over using the phylogeny alone. Using more biomarkers improved the performance. Shorter temporal spacing between transmission events and increased transmission heterogeneity reduced reconstruction accuracy, but larger clusters were not harder to get right. More sequence data per infected host also improved accuracy. We show that the method is robust to incomplete sampling and that adding biomarkers improves reconstructions of real HIV-1 transmission histories. The technology presented here could allow for better prevention programs by providing data for locally informed and tailored strategies.
为了识别和阻止活跃的 HIV 传播链,需要新的流行病学技术。在这里,我们描述了一种多生物标志物增强的方法,用于推断当地人群中潜在传播史的系统发育。HIV 生物标志物是可测量的生物学数量,与某人感染 HIV 的时间长短有关。为了训练我们的模型,我们使用了基于真实血清学检测、HIV 序列数据和纵向随访、未经治疗且已知感染时间的患者靶细胞计数的五个生物标志物。使用混合效应框架对生物标志物进行建模,以允许患者个体变异和一般趋势,并使用马尔可夫链蒙特卡罗(MCMC)方法对患者数据进行拟合。随后,通过从模型拟合中积分随机效应,获得给定观察到的生物标志物的未观察到的感染时间的密度。将关于感染时间的这种概率信息纳入到传播史和系统发育树重建的似然函数中,这是由 HIV 序列数据提供的。为了严格测试我们的方法,我们开发了一种基于合并的模拟框架,该框架根据特定或一般的传播史生成系统发育和生物标志物。在许多流行病学场景下的测试表明,在理想情况下,生物标志物增强的系统发育可以达到 90%的准确性。在涉及大量宿主内 HIV-1 进化、频繁的多谱系传播和频繁的传播的现实情况下,涉及 5-50 个宿主的传播群集中,平均准确性约为 50%。在仅使用系统发育的情况下,真实的生物标志物数据平均增加了 16 个百分点。使用更多的生物标志物可以提高性能。传播事件之间的时间间隔越短,传播异质性越大,重建准确性越低,但更大的集群并不难纠正。每个受感染宿主的更多序列数据也可以提高准确性。我们表明该方法对不完全采样具有鲁棒性,并且添加生物标志物可以改善真实 HIV-1 传播史的重建。这里介绍的技术可以为更好的预防计划提供数据,为当地提供信息和定制化策略。