Unité de Bioinformatique Évolutive, Institut Pasteur, Paris, France.
Sorbonne Université, Collège doctoral, Paris, France.
PLoS Comput Biol. 2021 Aug 26;17(8):e1008873. doi: 10.1371/journal.pcbi.1008873. eCollection 2021 Aug.
Drug resistance mutations (DRMs) appear in HIV under treatment pressure. DRMs are commonly transmitted to naive patients. The standard approach to reveal new DRMs is to test for significant frequency differences of mutations between treated and naive patients. However, we then consider each mutation individually and cannot hope to study interactions between several mutations. Here, we aim to leverage the ever-growing quantity of high-quality sequence data and machine learning methods to study such interactions (i.e. epistasis), as well as try to find new DRMs. We trained classifiers to discriminate between Reverse Transcriptase Inhibitor (RTI)-experienced and RTI-naive samples on a large HIV-1 reverse transcriptase (RT) sequence dataset from the UK (n ≈ 55, 000), using all observed mutations as binary representation features. To assess the robustness of our findings, our classifiers were evaluated on independent data sets, both from the UK and Africa. Important representation features for each classifier were then extracted as potential DRMs. To find novel DRMs, we repeated this process by removing either features or samples associated to known DRMs. When keeping all known resistance signal, we detected sufficiently prevalent known DRMs, thus validating the approach. When removing features corresponding to known DRMs, our classifiers retained some prediction accuracy, and six new mutations significantly associated with resistance were identified. These six mutations have a low genetic barrier, are correlated to known DRMs, and are spatially close to either the RT active site or the regulatory binding pocket. When removing both known DRM features and sequences containing at least one known DRM, our classifiers lose all prediction accuracy. These results likely indicate that all mutations directly conferring resistance have been found, and that our newly discovered DRMs are accessory or compensatory mutations. Moreover, apart from the accessory nature of the relationships we found, we did not find any significant signal of further, more subtle epistasis combining several mutations which individually do not seem to confer any resistance.
耐药突变(DRMs)在治疗压力下出现在 HIV 中。DRMs 通常会传播给未接受治疗的患者。揭示新 DRM 的标准方法是测试治疗和未治疗患者之间突变频率的显著差异。然而,我们随后单独研究每个突变,并且不能希望研究几个突变之间的相互作用。在这里,我们旨在利用不断增长的高质量序列数据和机器学习方法来研究这种相互作用(即上位性),并尝试发现新的 DRM。我们在来自英国的大型 HIV-1 逆转录酶(RT)序列数据集(n≈55000)上训练分类器,以区分经验丰富的逆转录酶抑制剂(RTI)和 RTI 未经验证的样本,使用所有观察到的突变作为二进制表示特征。为了评估我们发现的稳健性,我们的分类器在来自英国和非洲的独立数据集上进行了评估。然后,从每个分类器中提取重要的表示特征作为潜在的 DRM。为了找到新的 DRM,我们通过去除与已知 DRM 相关的特征或样本来重复此过程。当保留所有已知的耐药信号时,我们检测到足够普遍的已知 DRM,从而验证了该方法。当去除与已知 DRM 对应的特征时,我们的分类器仍然保留了一些预测准确性,并鉴定出 6 个与耐药性显著相关的新突变。这 6 个突变的遗传屏障较低,与已知的 DRM 相关,并且与 RT 活性位点或调节结合口袋空间接近。当去除已知 DRM 特征和至少包含一个已知 DRM 的序列时,我们的分类器失去了所有的预测准确性。这些结果可能表明,所有直接赋予耐药性的突变都已被发现,并且我们新发现的 DRM 是辅助或补偿性突变。此外,除了我们发现的关系的辅助性质外,我们没有发现任何进一步的、更微妙的上位性的显著信号,即几个突变结合在一起,单独似乎不会赋予任何耐药性。