使用堆叠集成学习框架加速植物蛋白致敏潜力的鉴定。

Charoenkwan Phasit, Chumnanpuen Pramote, Schaduangrat Nalini, Shoombuatong Watshara

Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Thailand.

Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, Thailand.

J Biomol Struct Dyn. 2024 Feb 22:1-13. doi: 10.1080/07391102.2024.2318482.

Plant-allergenic proteins (PAPs) have the potential to induce allergic reactions in certain individuals. While these proteins are generally innocuous for the majority of people, they can elicit an immune response in those with particular sensitivities. Thus, screening and prioritizing the allergenic potential of plant proteins is indispensable for the development of diagnostic tools, therapeutic interventions or medications to treat allergic reactions. However, investigating the allergenic potential of plant proteins based on experimental methods is costly and labour-intensive. Therefore, we develop StackPAP, a three-layer stacking ensemble framework for accurate large-scale identification of PAPs. In StackPAP, at the first layer, we conducted a comprehensive analysis of an extensive set of feature descriptors. Subsequently, we selected and fused five potential sequence-based feature descriptors, including amphiphilic pseudo-amino acid composition, dipeptide deviation from expected mean, amino acid composition, pseudo amino acid composition and dipeptide composition. Additionally, we applied an efficient genetic algorithm (GA-SAR) to determine informative feature sets. In the second layer, 12 powerful machine learning (ML) methods, in combination with all the informative feature sets, were employed to construct a pool of base classifiers. Finally, 13 potential base classifiers were selected using the GA-SAR method and combined to develop the final meta-classifier. Our experimental results revealed the promising prediction performance of StackPAP, with an accuracy, Matthew's correlation coefficient and AUC of 0.984, 0.969 and 0.993, respectively, as judged by the independent test dataset. In conclusion, both cross-validation and independent test results indicated the superior performance of StackPAP compared with several ML-based classifiers. To accelerate the identification of the allergenicity of plant proteins, we developed a user-friendly web server for StackPAP (https://pmlabqsar.pythonanywhere.com/StackPAP). We anticipate that StackPAP will be an efficient and useful tool for rapidly screening PAPs from a vast number of plant proteins.

植物过敏原蛋白（PAPs）有可能在某些个体中引发过敏反应。虽然这些蛋白质对大多数人来说通常是无害的，但它们会在具有特殊敏感性的人群中引发免疫反应。因此，筛选植物蛋白的致敏潜力并确定其优先级对于开发诊断工具、治疗干预措施或治疗过敏反应的药物而言不可或缺。然而，基于实验方法研究植物蛋白的致敏潜力既昂贵又耗费人力。因此，我们开发了StackPAP，这是一个用于准确大规模识别PAPs的三层堆叠集成框架。在StackPAP中，第一层，我们对大量特征描述符进行了全面分析。随后，我们选择并融合了五个基于序列的潜在特征描述符，包括两亲性伪氨基酸组成、二肽与预期均值的偏差、氨基酸组成、伪氨基酸组成和二肽组成。此外，我们应用了一种高效的遗传算法（GA-SAR）来确定信息丰富的特征集。在第二层，结合所有信息丰富的特征集，使用12种强大的机器学习（ML）方法构建了一个基础分类器库。最后，使用GA-SAR方法选择了13个潜在的基础分类器并将其组合起来开发最终的元分类器。我们的实验结果显示StackPAP具有良好的预测性能，根据独立测试数据集判断，其准确率、马修斯相关系数和AUC分别为0.984、0.969和0.993。总之，交叉验证和独立测试结果均表明StackPAP的性能优于几个基于ML的分类器。为了加速植物蛋白致敏性的识别，我们为StackPAP开发了一个用户友好的网络服务器（https://pmlabqsar.pythonanywhere.com/StackPAP）。我们预计StackPAP将成为从大量植物蛋白中快速筛选PAPs的高效且有用的工具。

相似文献

Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework.

J Biomol Struct Dyn. 2024 Feb 22:1-13. doi: 10.1080/07391102.2024.2318482.

Computational prediction of allergenic proteins based on multi-feature fusion.

Front Genet. 2023 Oct 19;14:1294159. doi: 10.3389/fgene.2023.1294159. eCollection 2023.

Pretoria: An effective computational approach for accurate and high-throughput identification of CD8 t-cell epitopes of eukaryotic pathogens.

Int J Biol Macromol. 2023 May 31;238:124228. doi: 10.1016/j.ijbiomac.2023.124228. Epub 2023 Mar 29.

Deepstack-ACE: A deep stacking-based ensemble learning framework for the accelerated discovery of ACE inhibitory peptides.

Methods. 2025 Feb;234:131-140. doi: 10.1016/j.ymeth.2024.12.005. Epub 2024 Dec 19.

TROLLOPE: A novel sequence-based stacked approach for the accelerated discovery of linear T-cell epitopes of hepatitis C virus.

PLoS One. 2023 Aug 25;18(8):e0290538. doi: 10.1371/journal.pone.0290538. eCollection 2023.

SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins.

Comput Biol Med. 2022 Jul;146:105704. doi: 10.1016/j.compbiomed.2022.105704. Epub 2022 Jun 7.

Stack-AVP: A Stacked Ensemble Predictor Based on Multi-view Information for Fast and Accurate Discovery of Antiviral Peptides.

J Mol Biol. 2025 Mar 15;437(6):168853. doi: 10.1016/j.jmb.2024.168853. Epub 2024 Nov 6.

StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens.

BMC Bioinformatics. 2023 Jul 28;24(1):301. doi: 10.1186/s12859-023-05421-x.

StackDPPIV: A novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides.

Methods. 2022 Aug;204:189-198. doi: 10.1016/j.ymeth.2021.12.001. Epub 2021 Dec 6.

pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm.

Front Cell Dev Biol. 2022 May 24;10:894874. doi: 10.3389/fcell.2022.894874. eCollection 2022.

引用本文的文献

M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy.

BMC Bioinformatics. 2025 Apr 30;26(1):117. doi: 10.1186/s12859-025-06132-1.

DeepBP: Ensemble deep learning strategy for bioactive peptide prediction.

BMC Bioinformatics. 2024 Nov 11;25(1):352. doi: 10.1186/s12859-024-05974-5.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework.

J Biomol Struct Dyn. 2024 Feb 22:1-13. doi: 10.1080/07391102.2024.2318482.

Computational prediction of allergenic proteins based on multi-feature fusion.

Front Genet. 2023 Oct 19;14:1294159. doi: 10.3389/fgene.2023.1294159. eCollection 2023.

Pretoria: An effective computational approach for accurate and high-throughput identification of CD8 t-cell epitopes of eukaryotic pathogens.

Int J Biol Macromol. 2023 May 31;238:124228. doi: 10.1016/j.ijbiomac.2023.124228. Epub 2023 Mar 29.

Deepstack-ACE: A deep stacking-based ensemble learning framework for the accelerated discovery of ACE inhibitory peptides.

Methods. 2025 Feb;234:131-140. doi: 10.1016/j.ymeth.2024.12.005. Epub 2024 Dec 19.

TROLLOPE: A novel sequence-based stacked approach for the accelerated discovery of linear T-cell epitopes of hepatitis C virus.

PLoS One. 2023 Aug 25;18(8):e0290538. doi: 10.1371/journal.pone.0290538. eCollection 2023.

SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins.

Comput Biol Med. 2022 Jul;146:105704. doi: 10.1016/j.compbiomed.2022.105704. Epub 2022 Jun 7.

Stack-AVP: A Stacked Ensemble Predictor Based on Multi-view Information for Fast and Accurate Discovery of Antiviral Peptides.

J Mol Biol. 2025 Mar 15;437(6):168853. doi: 10.1016/j.jmb.2024.168853. Epub 2024 Nov 6.

StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens.

BMC Bioinformatics. 2023 Jul 28;24(1):301. doi: 10.1186/s12859-023-05421-x.

StackDPPIV: A novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides.

Methods. 2022 Aug;204:189-198. doi: 10.1016/j.ymeth.2021.12.001. Epub 2021 Dec 6.

pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm.

Front Cell Dev Biol. 2022 May 24;10:894874. doi: 10.3389/fcell.2022.894874. eCollection 2022.

引用本文的文献

M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy.

BMC Bioinformatics. 2025 Apr 30;26(1):117. doi: 10.1186/s12859-025-06132-1.

DeepBP: Ensemble deep learning strategy for bioactive peptide prediction.

BMC Bioinformatics. 2024 Nov 11;25(1):352. doi: 10.1186/s12859-024-05974-5.

Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献