Obeng Billal M, Kouyos Roger D, Kusejko Katharina, Salazar-Vizcaya Luisa, Günthard Huldrych F, Kelleher Anthony D, Di Giallonardo Francesca
The Kirby Institute, UNSW Sydney, Sydney, Australia.
Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland; Institute of Medical Virology, University of Zurich, Zurich, Switzerland.
Virology. 2025 Jul;608:110558. doi: 10.1016/j.virol.2025.110558. Epub 2025 Apr 29.
HIV-1 cluster analysis has been widely used in characterizing HIV-1 transmission and some countries have implemented such molecular epidemiology as part of their prevention strategy. However, HIV-1 sequences derive from varying genome regions, which affects phylogenetic clustering outputs. Here, we apply different tools to run a sensitivity analysis for assessing which threshold give the most cohesive clustering outputs for different data sources. We used a dataset of 174 full-length sequences of subtype B from the Swiss HIV Cohort Study and publicly available subtype C from South Africa. Each dataset was divided into sub-genomic sub-datasets covering gag, pol, and env. pol was further subdivided into regions commonly used in HIV-1 genotyping laboratories (pr-rt, rt-int, and pr-rt-int). Cluster analyses for each sub-genomic region was performed specifying varying distance thresholds of 0.5 %-4.5 % and tree branch support of 70 %, 90 % and 99 % in ClusterPicker. Tree topologies and clustering outputs were compared against each other to assess cluster similarity. Pylogenies using pol, pr-rt-int, or rt-int had more robust tree topologies compared to gag and env. Cluster composition changed with increasing genetic distance threshold but was not affected by branch support. Cluster identity was most similar around genetic distances of 2.5 (±0.5)% for all sub-genomic regions and for both subtype B and C. Our study demonstrated the value of performing a sensitivity analysis before setting a genetic distance threshold for clustering output and that the pol region is appropriate for clustering outputs and can be used for near real-time HIV-1 cluster detection.
HIV-1聚类分析已广泛用于表征HIV-1传播,一些国家已将这种分子流行病学作为其预防策略的一部分。然而,HIV-1序列来自不同的基因组区域,这会影响系统发育聚类结果。在这里,我们应用不同工具进行敏感性分析,以评估哪种阈值能为不同数据源给出最具凝聚力的聚类结果。我们使用了来自瑞士HIV队列研究的174个B亚型全长序列数据集以及来自南非的公开可用C亚型数据集。每个数据集被划分为覆盖gag、pol和env的亚基因组子数据集。pol进一步细分为HIV-1基因分型实验室常用的区域(pr-rt、rt-int和pr-rt-int)。在ClusterPicker中,对每个亚基因组区域进行聚类分析,指定0.5% - 4.5%的不同距离阈值以及70%、90%和99%的树分支支持率。将树拓扑结构和聚类结果相互比较以评估聚类相似性。与gag和env相比,使用pol、pr-rt-int或rt-int构建的系统发育树拓扑结构更稳健。聚类组成随遗传距离阈值的增加而变化,但不受分支支持率的影响。对于所有亚基因组区域以及B亚型和C亚型,在遗传距离约为2.5(±0.5)%时,聚类一致性最为相似。我们的研究证明了在为聚类结果设置遗传距离阈值之前进行敏感性分析的价值,并且pol区域适用于聚类结果,可用于近实时HIV-1聚类检测。