Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China.
Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae245.
In recent decades, antibodies have emerged as indispensable therapeutics for combating diseases, particularly viral infections. However, their development has been hindered by limited structural information and labor-intensive engineering processes. Fortunately, significant advancements in deep learning methods have facilitated the precise prediction of protein structure and function by leveraging co-evolution information from homologous proteins. Despite these advances, predicting the conformation of antibodies remains challenging due to their unique evolution and the high flexibility of their antigen-binding regions. Here, to address this challenge, we present the Bio-inspired Antibody Language Model (BALM). This model is trained on a vast dataset comprising 336 million 40% nonredundant unlabeled antibody sequences, capturing both unique and conserved properties specific to antibodies. Notably, BALM showcases exceptional performance across four antigen-binding prediction tasks. Moreover, we introduce BALMFold, an end-to-end method derived from BALM, capable of swiftly predicting full atomic antibody structures from individual sequences. Remarkably, BALMFold outperforms those well-established methods like AlphaFold2, IgFold, ESMFold and OmegaFold in the antibody benchmark, demonstrating significant potential to advance innovative engineering and streamline therapeutic antibody development by reducing the need for unnecessary trials. The BALMFold structure prediction server is freely available at https://beamlab-sh.com/models/BALMFold.
在最近几十年中,抗体已成为对抗疾病(尤其是病毒感染)的不可或缺的治疗方法。然而,由于结构信息有限和工程工艺繁琐,其开发受到了阻碍。幸运的是,深度学习方法的显著进步使得通过同源蛋白的共进化信息可以精确地预测蛋白质的结构和功能。尽管取得了这些进展,但由于抗体独特的进化和抗原结合区域的高度灵活性,预测抗体的构象仍然具有挑战性。在这里,为了解决这一挑战,我们提出了基于生物启发的抗体语言模型(BALM)。该模型在一个包含 3.36 亿个 40%非冗余未标记抗体序列的大型数据集上进行训练,该数据集捕获了抗体特有的独特和保守性质。值得注意的是,BALM 在四个抗原结合预测任务中均表现出色。此外,我们引入了 BALMFold,这是一种源自 BALM 的端到端方法,能够从单个序列快速预测完整的抗体原子结构。BALMFold 在抗体基准测试中表现优于 AlphaFold2、IgFold、ESMFold 和 OmegaFold 等成熟方法,这表明它具有很大的潜力,通过减少不必要的试验,推进创新工程并简化治疗性抗体的开发。BALMFold 的结构预测服务器可在 https://beamlab-sh.com/models/BALMFold 免费获得。