Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark.
La Jolla Institute for Immunology, La Jolla, California, USA.
Protein Sci. 2022 Dec;31(12):e4497. doi: 10.1002/pro.4497.
B-cell epitope prediction tools are of great medical and commercial interest due to their practical applications in vaccine development and disease diagnostics. The introduction of protein language models (LMs), trained on unprecedented large datasets of protein sequences and structures, tap into a powerful numeric representation that can be exploited to accurately predict local and global protein structural features from amino acid sequences only. In this paper, we present BepiPred-3.0, a sequence-based epitope prediction tool that, by exploiting LM embeddings, greatly improves the prediction accuracy for both linear and conformational epitope prediction on several independent test sets. Furthermore, by carefully selecting additional input variables and epitope residue annotation strategy, performance was further improved, thus achieving unprecedented predictive power. Our tool can predict epitopes across hundreds of sequences in minutes. It is freely available as a web server and a standalone package at https://services.healthtech.dtu.dk/service.php?BepiPred-3.0 with a user-friendly interface to navigate the results.
B 细胞表位预测工具因其在疫苗开发和疾病诊断中的实际应用而具有重要的医学和商业价值。引入基于蛋白质的语言模型(LMs),这些模型基于前所未有的大型蛋白质序列和结构数据集进行训练,利用了一种强大的数值表示,可以仅从氨基酸序列准确预测局部和全局蛋白质结构特征。在本文中,我们提出了 BepiPred-3.0,这是一种基于序列的表位预测工具,通过利用 LM 嵌入,可以大大提高对几个独立测试集的线性和构象表位预测的准确性。此外,通过仔细选择其他输入变量和表位残基注释策略,进一步提高了性能,从而实现了前所未有的预测能力。我们的工具可以在几分钟内预测数百个序列中的表位。它可以免费作为一个网络服务器和一个独立的软件包在 https://services.healthtech.dtu.dk/service.php?BepiPred-3.0 上获得,具有用户友好的界面来浏览结果。