Semenova Lesia, Wang Yingfan, Falcinelli Shane, Archin Nancie, Cooper-Volkheimer Alicia D, Margolis David M, Goonetilleke Nilu, Murdoch David M, Rudin Cynthia D, Browne Edward P
bioRxiv. 2024 Jun 25:2023.11.16.567386. doi: 10.1101/2023.11.16.567386.
Understanding the interplay between the HIV reservoir and the host immune system may yield insights into HIV persistence during antiretroviral therapy (ART) and inform strategies for a cure. Here, we applied machine learning approaches to cross-sectional high-parameter HIV reservoir and immunology data in order to characterize host-reservoir associations and generate new hypotheses about HIV reservoir biology. High-dimensional immunophenotyping, quantification of HIV-specific T cell responses, and measurement of genetically intact and total HIV proviral DNA frequencies were performed on peripheral blood samples from 115 people with HIV (PWH) on long-term ART. Analysis demonstrated that both intact and total proviral DNA frequencies were positively correlated with T cell activation and exhaustion. Years of ART and select bifunctional HIV-specific CD4 T cell responses were negatively correlated with the percentage of intact proviruses. A Leave-One-Covariate-Out (LOCO) inference approach identified specific HIV reservoir and clinical-demographic parameters, such as age and biological sex, that were particularly important in predicting immunophenotypes. Overall, immune parameters were more strongly associated with total HIV proviral frequencies than intact proviral frequencies. Uniquely, however, expression of the IL-7 receptor alpha chain (CD127) on CD4 T cells was more strongly correlated with the intact reservoir. Unsupervised dimension reduction analysis identified two main clusters of PWH with distinct immune and reservoir characteristics. Using reservoir correlates identified in these initial analyses, decision tree methods were employed to visualize relationships among multiple immune and clinical-demographic parameters and the HIV reservoir. Finally, using random splits of our data as training-test sets, machine learning algorithms predicted with approximately 70% accuracy whether a given participant had qualitatively high or low levels of total or intact HIV DNA. The techniques described here may be useful for assessing global patterns within the increasingly high-dimensional data used in HIV reservoir and other studies of complex biology.
了解HIV储存库与宿主免疫系统之间的相互作用,可能有助于深入了解抗逆转录病毒疗法(ART)期间HIV的持续存在,并为治愈策略提供依据。在此,我们应用机器学习方法分析横断面高参数HIV储存库和免疫学数据,以表征宿主与储存库之间的关联,并生成有关HIV储存库生物学的新假设。对115名接受长期ART治疗的HIV感染者(PWH)的外周血样本进行了高维免疫表型分析、HIV特异性T细胞反应定量以及基因完整和总HIV前病毒DNA频率的测量。分析表明,完整和总前病毒DNA频率均与T细胞活化和耗竭呈正相关。ART治疗年限和特定的双功能HIV特异性CD4 T细胞反应与完整前病毒的百分比呈负相关。一种留一协变量法(LOCO)推理方法确定了特定的HIV储存库以及临床人口统计学参数,如年龄和生物学性别,这些参数在预测免疫表型方面尤为重要。总体而言,免疫参数与总HIV前病毒频率的相关性比与完整前病毒频率的相关性更强。然而,独特的是,CD4 T细胞上IL-7受体α链(CD127)的表达与完整储存库的相关性更强。无监督降维分析确定了两组具有不同免疫和储存库特征的PWH。利用这些初步分析中确定的储存库相关因素,采用决策树方法来可视化多种免疫和临床人口统计学参数与HIV储存库之间的关系。最后,将我们的数据随机划分为训练-测试集,机器学习算法以大约70%的准确率预测给定参与者的总HIV DNA或完整HIV DNA水平在质量上是高还是低。本文所述技术可能有助于评估HIV储存库及其他复杂生物学研究中日益高维的数据中的全局模式。