• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于SARS和COVID-19疫苗设计的线性B细胞表位预测:集成平衡集成学习模型和重采样策略

Linear B-cell epitope prediction for SARS and COVID-19 vaccine design: Integrating balanced ensemble learning models and resampling strategies.

作者信息

Gurcan Fatih

机构信息

Department of Management Information Systems, Faculty of Economics and Administrative Sciences, Karadeniz Technical University, Trabzon, Turkey.

出版信息

PeerJ Comput Sci. 2025 Jun 18;11:e2970. doi: 10.7717/peerj-cs.2970. eCollection 2025.

DOI:10.7717/peerj-cs.2970
PMID:40567760
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12193457/
Abstract

This study presents a comprehensive machine learning framework to enhance the prediction accuracy of B-cell epitopes and antibody recognition related to Severe Acute Respiratory Syndrome (SARS) and Coronavirus Disease 2019 (COVID-19). To address the issue of data imbalance, various resampling techniques were applied using three types of strategies: over-sampling, under-sampling, and hybrid-sampling. The implemented resampling methods were designed to improve class balance and enhance model training. The rebalanced datasets were then used in model building with ensemble classifiers employing Boosting, Bagging, and Balancing strategies. Hyperparameter optimization for the classifiers was conducted using GridSearchCV, while feature selection was performed with the recursive feature elimination (RFE) algorithm. Model performance was evaluated using seven different metrics: Accuracy, Precision, Recall, F1-score, receiver operating characteristic area under the curve (ROC AUC), precision recall area under the curve (PR AUC), and Matthews correlation coefficient (MCC). Furthermore, statistical significance analyses including paired t-test, Wilcoxon, and permutation tests confirmed the reliability of the model improvements. To interpret the model's predictive behavior, peptides with the highest confidence among correctly classified instances were identified as potential epitope candidates. The results indicate that the combination of Synthetic Minority Over-Sampling Technique-Edited Nearest Neighbors (SMOTE-ENN), and ExtraTrees yielded the best performance, achieving an ROC AUC score of 0.9899. The combination of Instance Hardness Threshold (IHT) and ExtraTrees followed closely with a score of 0.9799. These findings emphasize the effectiveness of integrating resampling models and balancing ensemble classifiers in improving the accuracy of B-cell epitope prediction and antibody recognition for SARS and COVID-19 infections. This study contributes to vaccine development efforts and the advancement of immunoinformatics research by identifying promising epitope candidates.

摘要

本研究提出了一个全面的机器学习框架,以提高与严重急性呼吸综合征(SARS)和2019冠状病毒病(COVID-19)相关的B细胞表位及抗体识别的预测准确性。为解决数据不平衡问题,采用了三种策略应用各种重采样技术:过采样、欠采样和混合采样。所实施的重采样方法旨在改善类别平衡并增强模型训练。然后,将重新平衡的数据集用于使用Boosting、Bagging和平衡策略的集成分类器进行模型构建。使用GridSearchCV对分类器进行超参数优化,同时使用递归特征消除(RFE)算法进行特征选择。使用七种不同的指标评估模型性能:准确率、精确率、召回率、F1分数、曲线下面积(ROC AUC)、精确召回曲线下面积(PR AUC)和马修斯相关系数(MCC)。此外,包括配对t检验、威尔科克森检验和排列检验在内的统计显著性分析证实了模型改进的可靠性。为解释模型的预测行为,在正确分类的实例中具有最高置信度的肽被确定为潜在的表位候选物。结果表明,合成少数类过采样技术编辑最近邻法(SMOTE-ENN)和极端随机树(ExtraTrees)的组合产生了最佳性能,ROC AUC得分为0.9899。实例硬度阈值(IHT)和ExtraTrees的组合紧随其后,得分为0.9799。这些发现强调了整合重采样模型和平衡集成分类器在提高SARS和COVID-19感染的B细胞表位预测及抗体识别准确性方面的有效性。本研究通过识别有前景的表位候选物,为疫苗开发工作和免疫信息学研究的进展做出了贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6de0/12193457/3174a9525855/peerj-cs-11-2970-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6de0/12193457/a0ac495949fd/peerj-cs-11-2970-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6de0/12193457/3174a9525855/peerj-cs-11-2970-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6de0/12193457/a0ac495949fd/peerj-cs-11-2970-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6de0/12193457/3174a9525855/peerj-cs-11-2970-g002.jpg

相似文献

1
Linear B-cell epitope prediction for SARS and COVID-19 vaccine design: Integrating balanced ensemble learning models and resampling strategies.用于SARS和COVID-19疫苗设计的线性B细胞表位预测:集成平衡集成学习模型和重采样策略
PeerJ Comput Sci. 2025 Jun 18;11:e2970. doi: 10.7717/peerj-cs.2970. eCollection 2025.
2
Machine learning models predict triage levels, massive transfusion protocol activation, and mortality in trauma utilizing patients hemodynamics on admission.机器学习模型利用创伤患者入院时的血流动力学来预测分诊级别、大量输血方案的激活和死亡率。
Comput Biol Med. 2024 Sep;179:108880. doi: 10.1016/j.compbiomed.2024.108880. Epub 2024 Jul 16.
3
Prediction of Insulin Resistance in Nondiabetic Population Using LightGBM and Cohort Validation of Its Clinical Value: Cross-Sectional and Retrospective Cohort Study.使用LightGBM预测非糖尿病人群的胰岛素抵抗及其临床价值的队列验证:横断面和回顾性队列研究
JMIR Med Inform. 2025 Jun 13;13:e72238. doi: 10.2196/72238.
4
Stacked Ensemble Learning for Classification of Parkinson's Disease Using Telemonitoring Vocal Features.基于远程监测语音特征的帕金森病分类堆叠集成学习
Diagnostics (Basel). 2025 Jun 9;15(12):1467. doi: 10.3390/diagnostics15121467.
5
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
6
Artificial Intelligence-Based prediction model for surgical site infection in metastatic spinal disease: a multicenter development and validation study.基于人工智能的转移性脊柱疾病手术部位感染预测模型:一项多中心开发与验证研究。
Int J Surg. 2025 Jun 27. doi: 10.1097/JS9.0000000000002806.
7
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
8
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
9
Machine Learning-Augmented Triage for Sepsis: Real-Time ICU Mortality Prediction Using SHAP-Explained Meta-Ensemble Models.用于脓毒症的机器学习增强分诊:使用SHAP解释的元集成模型进行重症监护病房实时死亡率预测
Biomedicines. 2025 Jun 12;13(6):1449. doi: 10.3390/biomedicines13061449.
10
From pixels to prognosis: leveraging radiomics and machine learning to predict IDH1 genotype in gliomas.从像素到预后:利用影像组学和机器学习预测胶质瘤中的异柠檬酸脱氢酶1(IDH1)基因型
Neurosurg Rev. 2025 Apr 29;48(1):396. doi: 10.1007/s10143-025-03515-z.

本文引用的文献

1
Advanced Brain Tumor Classification in MR Images Using Transfer Learning and Pre-Trained Deep CNN Models.利用迁移学习和预训练深度卷积神经网络模型对磁共振图像中的高级脑肿瘤进行分类
Cancers (Basel). 2025 Jan 2;17(1):121. doi: 10.3390/cancers17010121.
2
Synthetic Boosted Resampling Using Deep Generative Adversarial Networks: A Novel Approach to Improve Cancer Prediction from Imbalanced Datasets.使用深度生成对抗网络的合成增强重采样:一种从不平衡数据集中改善癌症预测的新方法。
Cancers (Basel). 2024 Dec 2;16(23):4046. doi: 10.3390/cancers16234046.
3
Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis.
从不平衡数据中学习:先进重采样技术与机器学习模型的整合用于增强癌症诊断与预后
Cancers (Basel). 2024 Oct 8;16(19):3417. doi: 10.3390/cancers16193417.
4
Integrating machine learning to advance epitope mapping.整合机器学习以推进表位作图。
Front Immunol. 2024 Sep 30;15:1463931. doi: 10.3389/fimmu.2024.1463931. eCollection 2024.
5
Lessons from the COVID-19 Pandemic: Promoting Vaccination and Public Health Resilience, a Narrative Review.《新冠疫情的经验教训:促进疫苗接种与公共卫生韧性,一篇叙述性综述》
Vaccines (Basel). 2024 Aug 7;12(8):891. doi: 10.3390/vaccines12080891.
6
Forecasting CO emissions of fuel vehicles for an ecological world using ensemble learning, machine learning, and deep learning models.使用集成学习、机器学习和深度学习模型预测生态环境下燃油汽车的一氧化碳排放量。
PeerJ Comput Sci. 2024 Aug 7;10:e2234. doi: 10.7717/peerj-cs.2234. eCollection 2024.
7
BCEDB: a linear B-cell epitopes database for SARS-CoV-2.BCEDB:一个用于 SARS-CoV-2 的线性 B 细胞表位数据库。
Database (Oxford). 2023 Sep 30;2023. doi: 10.1093/database/baad065.
8
Epitopes and Mimotopes Identification Using Phage Display for Vaccine Development against Infectious Pathogens.利用噬菌体展示技术鉴定表位和模拟表位以开发针对传染性病原体的疫苗
Vaccines (Basel). 2023 Jun 29;11(7):1176. doi: 10.3390/vaccines11071176.
9
Computational design of mRNA vaccines.mRNA 疫苗的计算设计。
Vaccine. 2024 Mar 7;42(7):1831-1840. doi: 10.1016/j.vaccine.2023.07.024. Epub 2023 Jul 20.
10
What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities.数据科学家们在谈论哪些问题?通过问答社区的语义内容分析来识别当前的数据科学问题。
PeerJ Comput Sci. 2023 May 18;9:e1361. doi: 10.7717/peerj-cs.1361. eCollection 2023.