Microsoft Research, Duke University, Durham, United States.
Department of Computer Science, Duke University, Durham, United States.
Elife. 2024 Sep 9;13:RP94899. doi: 10.7554/eLife.94899.
Understanding the interplay between the HIV reservoir and the host immune system may yield insights into HIV persistence during antiretroviral therapy (ART) and inform strategies for a cure. Here, we applied machine learning (ML) approaches to cross-sectional high-parameter HIV reservoir and immunology data in order to characterize host-reservoir associations and generate new hypotheses about HIV reservoir biology. High-dimensional immunophenotyping, quantification of HIV-specific T cell responses, and measurement of genetically intact and total HIV proviral DNA frequencies were performed on peripheral blood samples from 115 people with HIV (PWH) on long-term ART. Analysis demonstrated that both intact and total proviral DNA frequencies were positively correlated with T cell activation and exhaustion. Years of ART and select bifunctional HIV-specific CD4 T cell responses were negatively correlated with the percentage of intact proviruses. A leave-one-covariate-out inference approach identified specific HIV reservoir and clinical-demographic parameters, such as age and biological sex, that were particularly important in predicting immunophenotypes. Overall, immune parameters were more strongly associated with total HIV proviral frequencies than intact proviral frequencies. Uniquely, however, expression of the IL-7 receptor alpha chain (CD127) on CD4 T cells was more strongly correlated with the intact reservoir. Unsupervised dimension reduction analysis identified two main clusters of PWH with distinct immune and reservoir characteristics. Using reservoir correlates identified in these initial analyses, decision tree methods were employed to visualize relationships among multiple immune and clinical-demographic parameters and the HIV reservoir. Finally, using random splits of our data as training-test sets, ML algorithms predicted with approximately 70% accuracy whether a given participant had qualitatively high or low levels of total or intact HIV DNA . The techniques described here may be useful for assessing global patterns within the increasingly high-dimensional data used in HIV reservoir and other studies of complex biology.
了解 HIV 储存库与宿主免疫系统之间的相互作用,可能有助于深入了解抗逆转录病毒治疗 (ART) 期间 HIV 的持续存在,并为治愈方法提供信息。在这里,我们应用机器学习 (ML) 方法对横断面高参数 HIV 储存库和免疫学数据进行分析,以描述宿主 - 储存库的关联,并生成关于 HIV 储存库生物学的新假设。对 115 名长期接受 ART 的 HIV 感染者 (PWH) 的外周血样本进行了高维免疫表型分析、HIV 特异性 T 细胞反应的定量检测以及遗传完整和总 HIV 前病毒 DNA 频率的测量。分析表明,完整和总前病毒 DNA 频率与 T 细胞激活和耗竭呈正相关。ART 年限和选择的双功能 HIV 特异性 CD4 T 细胞反应与完整前病毒的百分比呈负相关。一种“留一协变量”推断方法确定了特定的 HIV 储存库和临床 - 人口统计学参数,如年龄和生物性别,这些参数在预测免疫表型方面尤为重要。总体而言,免疫参数与总 HIV 前病毒频率的相关性强于完整前病毒频率。然而,独特的是,CD4 T 细胞上的白细胞介素 7 受体 alpha 链 (CD127) 的表达与完整的储存库更为相关。无监督维度减少分析确定了两个主要的 PWH 集群,具有不同的免疫和储存特征。使用在这些初始分析中确定的储存库相关性,决策树方法被用于可视化多个免疫和临床 - 人口统计学参数与 HIV 储存库之间的关系。最后,使用我们数据的随机拆分作为训练 - 测试集,ML 算法以大约 70%的准确率预测给定参与者的总或完整 HIV DNA 是否具有定性的高水平或低水平。这里描述的技术可能有助于评估 HIV 储存库和其他复杂生物学研究中使用的日益高维数据中的全局模式。