Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.
Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad114.
The ability to identify B-cell epitopes is an essential step in vaccine design, immunodiagnostic tests and antibody production. Several computational approaches have been proposed to identify, from an antigen protein or peptide sequence, which residues are more likely to be part of an epitope, but have limited performance on relatively homogeneous data sets and lack interpretability, limiting biological insights that could otherwise be obtained. To address these limitations, we have developed epitope1D, an explainable machine learning method capable of accurately identifying linear B-cell epitopes, leveraging two new descriptors: a graph-based signature representation of protein sequences, based on our well-established Cutoff Scanning Matrix algorithm and Organism Ontology information. Our model achieved Areas Under the ROC curve of up to 0.935 on cross-validation and blind tests, demonstrating robust performance. A comprehensive comparison to alternative methods using distinct benchmark data sets was also employed, with our model outperforming state-of-the-art tools. epitope1D represents not only a significant advance in predictive performance, but also allows biologically meaningful features to be combined and used for model interpretation. epitope1D has been made available as a user-friendly web server interface and application programming interface at https://biosig.lab.uq.edu.au/epitope1d/.
识别 B 细胞表位的能力是疫苗设计、免疫诊断测试和抗体生产的重要步骤。已经提出了几种计算方法,以便从抗原蛋白或肽序列中识别出哪些残基更有可能成为表位的一部分,但这些方法在相对同质的数据集上性能有限,并且缺乏可解释性,限制了可以获得的生物学见解。为了解决这些限制,我们开发了 epitope1D,这是一种可解释的机器学习方法,能够准确识别线性 B 细胞表位,利用两个新的描述符:基于我们成熟的截止扫描矩阵算法和生物本体论信息的基于图的蛋白质序列签名表示。我们的模型在交叉验证和盲测中达到了高达 0.935 的 ROC 曲线下面积,表现出稳健的性能。还使用不同的基准数据集对替代方法进行了全面比较,我们的模型优于最先进的工具。epitope1D 不仅代表了预测性能的重大进步,还允许组合具有生物学意义的特征并用于模型解释。epitope1D 已作为用户友好的网络服务器界面和应用程序编程接口在 https://biosig.lab.uq.edu.au/epitope1d/ 上提供。