Department of Automatic Control and Systems Engineering, Politehnica University of Bucharest, Bucharest 060042, Romania.
Bioinformatics. 2022 Sep 15;38(18):4278-4285. doi: 10.1093/bioinformatics/btac530.
Knowing the sensitivity of a viral strain versus a monoclonal antibody is of interest for HIV vaccine development and therapy. The HIV strains vary in their resistance to antibodies, and the accurate prediction of virus-antibody sensitivity can be used to find potent antibody combinations that broadly neutralize multiple and diverse HIV strains. Sensitivity prediction can be combined with other methods such as generative algorithms to design novel antibodies in silico or with feature selection to uncover the sites of interest in the sequence. However, these tools are limited in the absence of in silico accurate prediction methods.
Our method leverages the CATNAP dataset, probably the most comprehensive collection of HIV-antibodies assays, and predicts the antibody-virus sensitivity in the form of binary classification. The methods proposed by others focus primarily on analyzing the virus sequences. However, our article demonstrates the advantages gained by modeling the antibody-virus sensitivity as a function of both virus and antibody sequences. The input is formed by the virus envelope and the antibody variable region aminoacid sequences. No structural features are required, which makes our system very practical, given that sequence data is more common than structures. We compare with two other state-of-the-art methods that leverage the same dataset and use sequence data only. Our approach, based on neuronal networks and transfer learning, measures increased predictive performance as measured on a set of 31 specific broadly neutralizing antibodies.
https://github.com/vlad-danaila/deep_hiv_ab_pred/tree/fc-att-fix.
了解病毒株相对于单克隆抗体的敏感性对于 HIV 疫苗开发和治疗具有重要意义。HIV 株在其对抗体的耐药性方面存在差异,而准确预测病毒-抗体的敏感性可以用于寻找广泛中和多种 HIV 株的有效抗体组合。敏感性预测可以与其他方法(如生成算法)结合使用,以在计算机中设计新型抗体,或与特征选择结合使用,以揭示序列中的感兴趣的位点。然而,在缺乏计算机精确预测方法的情况下,这些工具是有限的。
我们的方法利用了 CATNAP 数据集,这可能是 HIV-抗体检测中最全面的数据集,并以二进制分类的形式预测抗体-病毒的敏感性。其他人提出的方法主要侧重于分析病毒序列。然而,我们的文章展示了通过将抗体-病毒敏感性建模为病毒和抗体序列的函数来获得的优势。输入由病毒包膜和抗体可变区氨基酸序列组成。不需要结构特征,这使得我们的系统非常实用,因为序列数据比结构数据更为常见。我们与其他两种利用相同数据集且仅使用序列数据的最新方法进行了比较。我们的方法基于神经元网络和迁移学习,在一组 31 种特定的广泛中和抗体上测量到了更高的预测性能。
https://github.com/vlad-danaila/deep_hiv_ab_pred/tree/fc-att-fix。