Suppr超能文献

基于蛋白质序列特征和 BERT 嵌入的线性 B 细胞表位预测。

Prediction of linear B-cell epitopes based on protein sequence features and BERT embeddings.

机构信息

School of Humanistic Medicine, Anhui Medical University, Hefei, 230032, Anhui, China.

School of Biomedical Engineering, Anhui Medical University, Hefei, 230030, Anhui, China.

出版信息

Sci Rep. 2024 Jan 30;14(1):2464. doi: 10.1038/s41598-024-53028-w.

Abstract

Linear B-cell epitopes (BCEs) play a key role in the development of peptide vaccines and immunodiagnostic reagents. Therefore, the accurate identification of linear BCEs is of great importance in the prevention of infectious diseases and the diagnosis of related diseases. The experimental methods used to identify BCEs are both expensive and time-consuming and they do not meet the demand for identification of large-scale protein sequence data. As a result, there is a need to develop an efficient and accurate computational method to rapidly identify linear BCE sequences. In this work, we developed the new linear BCE prediction method LBCE-BERT. This method is based on peptide chain sequence information and natural language model BERT embedding information, using an XGBoost classifier. The models were trained on three benchmark datasets. The model was training on three benchmark datasets for hyperparameter selection and was subsequently evaluated on several test datasets. The result indicate that our proposed method outperforms others in terms of AUROC and accuracy. The LBCE-BERT model is publicly available at: https://github.com/Lfang111/LBCE-BERT .

摘要

线性 B 细胞表位(BCEs)在肽疫苗和免疫诊断试剂的开发中起着关键作用。因此,准确识别线性 BCE 对于传染病的预防和相关疾病的诊断具有重要意义。用于识别 BCE 的实验方法既昂贵又耗时,无法满足大规模蛋白质序列数据识别的需求。因此,需要开发一种高效、准确的计算方法来快速识别线性 BCE 序列。在这项工作中,我们开发了新的线性 BCE 预测方法 LBCE-BERT。该方法基于肽链序列信息和自然语言模型 BERT 嵌入信息,使用 XGBoost 分类器。模型在三个基准数据集上进行训练。通过超参数选择在三个基准数据集上进行模型训练,然后在几个测试数据集上进行评估。结果表明,我们提出的方法在 AUROC 和准确性方面优于其他方法。LBCE-BERT 模型可在:https://github.com/Lfang111/LBCE-BERT 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验