AllerTrans：一种预测蛋白质序列变应原性的深度学习方法。

AllerTrans: a deep learning method for predicting the allergenicity of protein sequences.

作者信息

Sarlakifar Faezeh, Malek Hamed, Allahyari Fard Najaf

机构信息

Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran.

Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran, Iran.

出版信息

Biol Methods Protoc. 2025 Jul 9;10(1):bpaf040. doi: 10.1093/biomethods/bpaf040. eCollection 2025.

DOI:10.1093/biomethods/bpaf040

PMID:40656558

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12254128/

Abstract

Allergens are a major concern in determining protein safety, especially with the growing use of recombinant proteins in new medical products. These proteins require a careful allergenicity assessment to guarantee their safety. However, traditional laboratory tests for allergenicity are expensive and time-consuming. To address this challenge, bioinformatics offers efficient and cost-effective alternatives for predicting protein allergenicity. Deep learning models offer a promising solution for this purpose. Recently, with the emergence of protein language models(pLMs), high-quality and impactful feature vectors can be extracted from protein sequences using these specialized language models. Although different computational methods can be effective individually, combining them could improve the prediction results. Given this hypothesis, can we develop a more powerful approach than existing methods to predict protein allergenicity? In this study, we developed an enhanced deep learning model to predict the potential allergenicity of proteins based on their primary structure represented as protein sequences. In simple terms, this model classifies protein sequences into allergenic or non-allergenic classes. Our approach utilizes two pLMs to extract distinct feature vectors for each sequence, which are then fed into a deep neural network (DNN) model for classification. Combining these feature vectors improves the results. Finally, we integrated our top-performing models using ensemble modeling techniques. This approach could balance the model's sensitivity and specificity. Our proposed model demonstrates an improvement compared to existing models, achieving a sensitivity of 97.91%, a specificity of 97.69%, an accuracy of 97.80%, and an area under the receiver operating characteristic curve of 99% using the standard 2-fold cross-validation. The AllerTrans model has been deployed as a web-based prediction tool and is publicly accessible at: https://huggingface.co/spaces/sfaezella/AllerTrans.

摘要

过敏原是确定蛋白质安全性时的一个主要问题，尤其是随着重组蛋白在新型医疗产品中的使用日益增加。这些蛋白质需要进行仔细的致敏性评估以确保其安全性。然而，传统的致敏性实验室检测既昂贵又耗时。为应对这一挑战，生物信息学为预测蛋白质致敏性提供了高效且经济高效的替代方法。深度学习模型为此提供了一个有前景的解决方案。最近，随着蛋白质语言模型（pLMs）的出现，可以使用这些专门的语言模型从蛋白质序列中提取高质量且有影响力的特征向量。尽管不同的计算方法单独使用时可能有效，但将它们结合起来可能会改善预测结果。基于这一假设，我们能否开发出一种比现有方法更强大的方法来预测蛋白质致敏性呢？在本研究中，我们开发了一种增强的深度学习模型，以根据蛋白质序列所代表的一级结构来预测蛋白质的潜在致敏性。简单来说，该模型将蛋白质序列分类为致敏或非致敏类别。我们的方法利用两个pLMs为每个序列提取不同的特征向量，然后将这些特征向量输入到一个深度神经网络（DNN）模型中进行分类。结合这些特征向量可改善结果。最后，我们使用集成建模技术整合了表现最佳的模型。这种方法可以平衡模型的敏感性和特异性。我们提出的模型与现有模型相比有改进，在标准的2折交叉验证中，敏感性达到97.91%，特异性达到97.69%，准确率达到97.80%，受试者操作特征曲线下面积达到99%。AllerTrans模型已作为基于网络的预测工具进行部署，可通过以下网址公开访问：https://huggingface.co/spaces/sfaezella/AllerTrans。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed66/12254128/59d1600f7e40/bpaf040f1.jpg

相似文献

AllerTrans: a deep learning method for predicting the allergenicity of protein sequences.AllerTrans：一种预测蛋白质序列变应原性的深度学习方法。

Biol Methods Protoc. 2025 Jul 9;10(1):bpaf040. doi: 10.1093/biomethods/bpaf040. eCollection 2025.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

Short-Term Memory Impairment短期记忆障碍

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理（2025年结石病专家共识）

Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.

Carbon dioxide detection for diagnosis of inadvertent respiratory tract placement of enterogastric tubes in children.用于诊断儿童肠胃管意外置入呼吸道的二氧化碳检测

Cochrane Database Syst Rev. 2025 Feb 19;2(2):CD011196. doi: 10.1002/14651858.CD011196.pub2.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验：定性证据综合。

Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.

本文引用的文献

pLM4Alg: Protein Language Model-Based Predictors for Allergenic Proteins and Peptides.pLM4Alg：基于蛋白质语言模型的变应原性蛋白质和肽预测器

J Agric Food Chem. 2024 Jan 10;72(1):752-760. doi: 10.1021/acs.jafc.3c07143. Epub 2023 Dec 19.

DeepAlgPro: an interpretable deep neural network model for predicting allergenic proteins.DeepAlgPro：一种可解释的深度神经网络模型，用于预测过敏原蛋白。

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad246.

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

ALLERDET: A novel web app for prediction of protein allergenicity.ALLERDET：一种用于预测蛋白质变应原性的新型网络应用程序。

J Biomed Inform. 2022 Nov;135:104217. doi: 10.1016/j.jbi.2022.104217. Epub 2022 Oct 13.

ProAll-D: protein allergen detection using long short term memory - a deep learning approach.ProAll-D：使用长短期记忆网络的蛋白质过敏原检测——一种深度学习方法。

ADMET DMPK. 2022 Sep 13;10(3):231-240. doi: 10.5599/admet.1335. eCollection 2022.

AllerCatPro 2.0: a web server for predicting protein allergenicity potential.AllerCatPro 2.0：一个用于预测蛋白质变应原性潜力的网络服务器。

Nucleic Acids Res. 2022 Jul 5;50(W1):W36-W43. doi: 10.1093/nar/gkac446.

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans：通过自监督学习理解生命语言。

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.在二分类混淆矩阵评估中，马修斯相关系数（MCC）比平衡准确率、庄家知情度和标记度更可靠。

BioData Min. 2021 Feb 4;14(1):13. doi: 10.1186/s13040-021-00244-z.

AlgPred 2.0: an improved method for predicting allergenic proteins and mapping of IgE epitopes.AlgPred 2.0：一种改进的过敏原蛋白预测方法和 IgE 表位作图。

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa294.

Recent Advances of Deep Learning in Bioinformatics and Computational Biology.深度学习在生物信息学和计算生物学中的最新进展

Front Genet. 2019 Mar 26;10:214. doi: 10.3389/fgene.2019.00214. eCollection 2019.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

AllerTrans：一种预测蛋白质序列变应原性的深度学习方法。

AllerTrans: a deep learning method for predicting the allergenicity of protein sequences.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献