iBCE-EL：一种用于改进线性 B 细胞表位预测的新集成学习框架。

iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction.

机构信息

Department of Physiology, Ajou University School of Medicine, Suwon, South Korea.

Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, United States.

出版信息

Front Immunol. 2018 Jul 27;9:1695. doi: 10.3389/fimmu.2018.01695. eCollection 2018.

DOI:10.3389/fimmu.2018.01695

PMID:30100904

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6072840/

Abstract

Identification of B-cell epitopes (BCEs) is a fundamental step for epitope-based vaccine development, antibody production, and disease prevention and diagnosis. Due to the avalanche of protein sequence data discovered in postgenomic age, it is essential to develop an automated computational method to enable fast and accurate identification of novel BCEs within vast number of candidate proteins and peptides. Although several computational methods have been developed, their accuracy is unreliable. Thus, developing a reliable model with significant prediction improvements is highly desirable. In this study, we first constructed a non-redundant data set of 5,550 experimentally validated BCEs and 6,893 non-BCEs from the Immune Epitope Database. We then developed a novel ensemble learning framework for improved linear BCE predictor called iBCE-EL, a fusion of two independent predictors, namely, extremely randomized tree (ERT) and gradient boosting (GB) classifiers, which, respectively, uses a combination of physicochemical properties (PCP) and amino acid composition and a combination of dipeptide and PCP as input features. Cross-validation analysis on a benchmarking data set showed that iBCE-EL performed better than individual classifiers (ERT and GB), with a Matthews correlation coefficient (MCC) of 0.454. Furthermore, we evaluated the performance of iBCE-EL on the independent data set. Results show that iBCE-EL significantly outperformed the state-of-the-art method with an MCC of 0.463. To the best of our knowledge, iBCE-EL is the first ensemble method for linear BCEs prediction. iBCE-EL was implemented in a web-based platform, which is available at http://thegleelab.org/iBCE-EL. iBCE-EL contains two prediction modes. The first one identifying peptide sequences as BCEs or non-BCEs, while later one is aimed at providing users with the option of mining potential BCEs from protein sequences.

摘要

B 细胞表位 (BCE) 的鉴定是基于表位疫苗开发、抗体生产以及疾病预防和诊断的基础步骤。由于在后基因组时代发现了大量的蛋白质序列数据，因此开发一种自动化的计算方法来快速、准确地识别大量候选蛋白质和肽中的新型 BCE 至关重要。尽管已经开发了几种计算方法，但它们的准确性不可靠。因此，开发一个具有显著预测改进的可靠模型是非常需要的。在这项研究中，我们首先构建了一个非冗余数据集，其中包含 5550 个经实验验证的 BCE 和 6893 个非 BCE，这些数据来自免疫表位数据库。然后，我们开发了一种新的集成学习框架，用于提高线性 BCE 预测器，称为 iBCE-EL，它融合了两个独立的预测器，即极端随机树 (ERT) 和梯度提升 (GB) 分类器，它们分别使用物理化学性质 (PCP) 和氨基酸组成的组合以及二肽和 PCP 的组合作为输入特征。在基准数据集上的交叉验证分析表明，iBCE-EL 的性能优于单个分类器 (ERT 和 GB)，马修斯相关系数 (MCC) 为 0.454。此外，我们在独立数据集上评估了 iBCE-EL 的性能。结果表明，iBCE-EL 以 MCC 为 0.463 的优势显著优于最先进的方法。据我们所知，iBCE-EL 是第一个用于线性 BCE 预测的集成方法。iBCE-EL 已在基于网络的平台上实现，可在 http://thegleelab.org/iBCE-EL 上获得。iBCE-EL 包含两种预测模式。第一种是识别肽序列是 BCE 还是非 BCE，而第二种是为用户提供从蛋白质序列中挖掘潜在 BCE 的选项。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbe8/6072840/150225dc9bab/fimmu-09-01695-g001.jpg

相似文献

iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction.iBCE-EL：一种用于改进线性 B 细胞表位预测的新集成学习框架。

Front Immunol. 2018 Jul 27;9:1695. doi: 10.3389/fimmu.2018.01695. eCollection 2018.

PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions.PIP-EL：一种用于改进促炎肽预测的新集成学习方法。

Front Immunol. 2018 Jul 31;9:1783. doi: 10.3389/fimmu.2018.01783. eCollection 2018.

EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.EPMLR：基于序列的线性B细胞表位预测方法，采用多元线性回归。

BMC Bioinformatics. 2014 Dec 19;15(1):414. doi: 10.1186/s12859-014-0414-y.

Shotgun Immunoproteomic Approach for the Discovery of Linear B-Cell Epitopes in Biothreat Agents and .利用 shotgun 免疫蛋白质组学方法发现生物威胁剂中的线性 B 细胞表位。

Front Immunol. 2021 Sep 29;12:716676. doi: 10.3389/fimmu.2021.716676. eCollection 2021.

Improved method for linear B-cell epitope prediction using antigen's primary sequence.利用抗原一级序列预测线性 B 细胞表位的改良方法。

PLoS One. 2013 May 7;8(5):e62216. doi: 10.1371/journal.pone.0062216. Print 2013.

Introducing of an integrated artificial neural network and Chou's pseudo amino acid composition approach for computational epitope-mapping of Crimean-Congo haemorrhagic fever virus antigens.介绍一种集成的人工神经网络和 Chou 的伪氨基酸组成方法，用于计算克里米亚-刚果出血热病毒抗原的计算表位图谱。

Int Immunopharmacol. 2020 Jan;78:106020. doi: 10.1016/j.intimp.2019.106020. Epub 2019 Nov 24.

Computational B-cell epitope identification and production of neutralizing murine antibodies against Atroxlysin-I.计算 B 细胞表位鉴定和抗 Atroxlysin-I 中和性鼠源抗体的产生。

Sci Rep. 2018 Oct 8;8(1):14904. doi: 10.1038/s41598-018-33298-x.

B-Cell Epitope Predictions Using Computational Methods.基于计算方法的 B 细胞表位预测。

Methods Mol Biol. 2023;2552:239-254. doi: 10.1007/978-1-0716-2609-2_12.

Prediction of linear B-cell epitopes of hepatitis C virus for vaccine development.用于疫苗开发的丙型肝炎病毒线性B细胞表位预测

BMC Med Genomics. 2015;8 Suppl 4(Suppl 4):S3. doi: 10.1186/1755-8794-8-S4-S3. Epub 2015 Dec 9.

EpitopeVec: linear epitope prediction using deep protein sequence embeddings.EpitopeVec：使用深度蛋白质序列嵌入进行线性表位预测。

Bioinformatics. 2021 Dec 7;37(23):4517-4525. doi: 10.1093/bioinformatics/btab467.

引用本文的文献

Transformer-based deep learning enables improved B-cell epitope prediction in parasitic pathogens: A proof-of-concept study on Fasciola hepatica.基于Transformer的深度学习可改善对寄生性病原体中B细胞表位的预测：对肝片吸虫的概念验证研究

PLoS Negl Trop Dis. 2025 Apr 29;19(4):e0012985. doi: 10.1371/journal.pntd.0012985. eCollection 2025 Apr.

Advances of computational methods enhance the development of multi-epitope vaccines.计算方法的进步推动了多表位疫苗的发展。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf055.

Multi-epitope vaccine design of African swine fever virus considering T cell and B cell immunogenicity.基于T细胞和B细胞免疫原性的非洲猪瘟病毒多表位疫苗设计

AMB Express. 2024 Aug 31;14(1):95. doi: 10.1186/s13568-024-01749-6.

Optimizing sheep B-cell epitopes in recombinant antigen P29 for vaccine development.优化重组抗原 P29 中的绵羊 B 细胞表位用于疫苗开发。

Front Immunol. 2024 Aug 14;15:1451538. doi: 10.3389/fimmu.2024.1451538. eCollection 2024.

Interpretable molecular encodings and representations for machine learning tasks.用于机器学习任务的可解释分子编码和表示。

Comput Struct Biotechnol J. 2024 May 24;23:2326-2336. doi: 10.1016/j.csbj.2024.05.035. eCollection 2024 Dec.

BiSpec Pairwise AI: guiding the selection of bispecific antibody target combinations with pairwise learning and GPT augmentation.双特异性抗体 AI：通过成对学习和 GPT 增强指导双特异性抗体靶标组合的选择。

J Cancer Res Clin Oncol. 2024 May 7;150(5):237. doi: 10.1007/s00432-024-05740-3.

Immunoinformatic Identification of Multiple Epitopes of gp120 Protein of HIV-1 to Enhance the Immune Response against HIV-1 Infection.免疫信息学鉴定 HIV-1 gp120 蛋白的多个表位以增强针对 HIV-1 感染的免疫应答。

Int J Mol Sci. 2024 Feb 19;25(4):2432. doi: 10.3390/ijms25042432.

Prediction of linear B-cell epitopes based on protein sequence features and BERT embeddings.基于蛋白质序列特征和 BERT 嵌入的线性 B 细胞表位预测。

Sci Rep. 2024 Jan 30;14(1):2464. doi: 10.1038/s41598-024-53028-w.

Accelerating therapeutic protein design with computational approaches toward the clinical stage.利用计算方法加速治疗性蛋白质设计迈向临床阶段。

Comput Struct Biotechnol J. 2023 Apr 29;21:2909-2926. doi: 10.1016/j.csbj.2023.04.027. eCollection 2023.

Multi-perspectives and challenges in identifying B-cell epitopes.鉴定 B 细胞表位的多视角和挑战。

Protein Sci. 2023 Nov;32(11):e4785. doi: 10.1002/pro.4785.

本文引用的文献

iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC.iLoc-lncRNA：通过将八聚体组成纳入广义 PseKNC 来预测 lncRNA 的亚细胞位置。

Bioinformatics. 2018 Dec 15;34(24):4196-4204. doi: 10.1093/bioinformatics/bty508.

Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy.基于机器学习的细胞穿透肽预测及其摄取效率的改进准确性。

J Proteome Res. 2018 Aug 3;17(8):2715-2726. doi: 10.1021/acs.jproteome.8b00148. Epub 2018 Jul 2.

iRNA-3typeA: Identifying Three Types of Modification at RNA's Adenosine Sites.iRNA-3型A：鉴定RNA腺苷位点的三种修饰类型。

Mol Ther Nucleic Acids. 2018 Jun 1;11:468-474. doi: 10.1016/j.omtn.2018.03.012. Epub 2018 Mar 30.

AIPpred: Sequence-Based Prediction of Anti-inflammatory Peptides Using Random Forest.AIPpred：利用随机森林基于序列预测抗炎肽

Front Pharmacol. 2018 Mar 27;9:276. doi: 10.3389/fphar.2018.00276. eCollection 2018.

PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.PVP-SVM：使用支持向量机基于序列预测噬菌体病毒粒子蛋白

Front Microbiol. 2018 Mar 16;9:476. doi: 10.3389/fmicb.2018.00476. eCollection 2018.

Approach for Prediction of Antifungal Peptides.抗真菌肽的预测方法。

Front Microbiol. 2018 Feb 26;9:323. doi: 10.3389/fmicb.2018.00323. eCollection 2018.

DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.DHSpred：基于支持向量机，利用随机森林选择的最优特征进行人类DNA酶I超敏感位点预测。

Oncotarget. 2017 Dec 8;9(2):1944-1956. doi: 10.18632/oncotarget.23099. eCollection 2018 Jan 5.

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.AmPEP：基于氨基酸属性分布模式和随机森林的抗菌肽序列预测。

Sci Rep. 2018 Jan 26;8(1):1697. doi: 10.1038/s41598-018-19752-w.

iDTI-ESBoost: Identification of Drug Target Interaction Using Evolutionary and Structural Features with Boosting.iDTI-ESBoost：基于进化和结构特征的药物靶点相互作用识别与提升。

Sci Rep. 2017 Dec 18;7(1):17731. doi: 10.1038/s41598-017-18025-2.

In silico prediction of multiple-category classification model for cytochrome P450 inhibitors and non-inhibitors using machine-learning method.基于机器学习方法的细胞色素 P450 抑制剂和非抑制剂的多类别分类模型的计算机预测。

SAR QSAR Environ Res. 2017 Oct;28(10):863-874. doi: 10.1080/1062936X.2017.1399925.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

iBCE-EL：一种用于改进线性 B 细胞表位预测的新集成学习框架。

iBCE-EL: A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献