Suppr超能文献

PredIL13:结合多种机器和深度学习方法以及 ESM-2 语言模型,用于识别诱导 IL13 的肽。

PredIL13: Stacking a variety of machine and deep learning methods with ESM-2 language model for identifying IL13-inducing peptides.

机构信息

Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan.

出版信息

PLoS One. 2024 Aug 22;19(8):e0309078. doi: 10.1371/journal.pone.0309078. eCollection 2024.

Abstract

Interleukin (IL)-13 has emerged as one of the recently identified cytokine. Since IL-13 causes the severity of COVID-19 and alters crucial biological processes, it is urgent to explore novel molecules or peptides capable of including IL-13. Computational prediction has received attention as a complementary method to in-vivo and in-vitro experimental identification of IL-13 inducing peptides, because experimental identification is time-consuming, laborious, and expensive. A few computational tools have been presented, including the IL13Pred and iIL13Pred. To increase prediction capability, we have developed PredIL13, a cutting-edge ensemble learning method with the latest ESM-2 protein language model. This method stacked the probability scores outputted by 168 single-feature machine/deep learning models, and then trained a logistic regression-based meta-classifier with the stacked probability score vectors. The key technology was to implement ESM-2 and to select the optimal single-feature models according to their absolute weight coefficient for logistic regression (AWCLR), an indicator of the importance of each single-feature model. Especially, the sequential deletion of single-feature models based on the iterative AWCLR ranking (SDIWC) method constructed the meta-classifier consisting of the top 16 single-feature models, named PredIL13, while considering the model's accuracy. The PredIL13 greatly outperformed the-state-of-the-art predictors, thus is an invaluable tool for accelerating the detection of IL13-inducing peptide within the human genome.

摘要

白细胞介素 (IL)-13 已成为最近确定的细胞因子之一。由于 IL-13 导致 COVID-19 的严重程度并改变关键的生物学过程,因此迫切需要探索能够包括 IL-13 的新型分子或肽。计算预测作为体内和体外实验鉴定 IL-13 诱导肽的补充方法受到了关注,因为实验鉴定既耗时、费力又昂贵。已经提出了一些计算工具,包括 IL13Pred 和 iIL13Pred。为了提高预测能力,我们开发了 PredIL13,这是一种基于最新 ESM-2 蛋白质语言模型的前沿集成学习方法。该方法堆叠了 168 个单特征机器/深度学习模型输出的概率得分,然后使用堆叠的概率得分向量训练基于逻辑回归的元分类器。关键技术是实现 ESM-2,并根据其用于逻辑回归的绝对权重系数 (AWCLR) 为每个单特征模型选择最佳单特征模型,这是每个单特征模型重要性的指标。特别是,基于迭代 AWCLR 排序的单特征模型序列删除 (SDIWC) 方法构建了由前 16 个单特征模型组成的元分类器,名为 PredIL13,同时考虑了模型的准确性。PredIL13 大大优于最先进的预测器,因此是加速在人类基因组中检测 IL13 诱导肽的宝贵工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5492/11340954/91c5951b5df0/pone.0309078.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验