Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan.
PLoS One. 2024 Aug 22;19(8):e0309078. doi: 10.1371/journal.pone.0309078. eCollection 2024.
Interleukin (IL)-13 has emerged as one of the recently identified cytokine. Since IL-13 causes the severity of COVID-19 and alters crucial biological processes, it is urgent to explore novel molecules or peptides capable of including IL-13. Computational prediction has received attention as a complementary method to in-vivo and in-vitro experimental identification of IL-13 inducing peptides, because experimental identification is time-consuming, laborious, and expensive. A few computational tools have been presented, including the IL13Pred and iIL13Pred. To increase prediction capability, we have developed PredIL13, a cutting-edge ensemble learning method with the latest ESM-2 protein language model. This method stacked the probability scores outputted by 168 single-feature machine/deep learning models, and then trained a logistic regression-based meta-classifier with the stacked probability score vectors. The key technology was to implement ESM-2 and to select the optimal single-feature models according to their absolute weight coefficient for logistic regression (AWCLR), an indicator of the importance of each single-feature model. Especially, the sequential deletion of single-feature models based on the iterative AWCLR ranking (SDIWC) method constructed the meta-classifier consisting of the top 16 single-feature models, named PredIL13, while considering the model's accuracy. The PredIL13 greatly outperformed the-state-of-the-art predictors, thus is an invaluable tool for accelerating the detection of IL13-inducing peptide within the human genome.
白细胞介素 (IL)-13 已成为最近确定的细胞因子之一。由于 IL-13 导致 COVID-19 的严重程度并改变关键的生物学过程,因此迫切需要探索能够包括 IL-13 的新型分子或肽。计算预测作为体内和体外实验鉴定 IL-13 诱导肽的补充方法受到了关注,因为实验鉴定既耗时、费力又昂贵。已经提出了一些计算工具,包括 IL13Pred 和 iIL13Pred。为了提高预测能力,我们开发了 PredIL13,这是一种基于最新 ESM-2 蛋白质语言模型的前沿集成学习方法。该方法堆叠了 168 个单特征机器/深度学习模型输出的概率得分,然后使用堆叠的概率得分向量训练基于逻辑回归的元分类器。关键技术是实现 ESM-2,并根据其用于逻辑回归的绝对权重系数 (AWCLR) 为每个单特征模型选择最佳单特征模型,这是每个单特征模型重要性的指标。特别是,基于迭代 AWCLR 排序的单特征模型序列删除 (SDIWC) 方法构建了由前 16 个单特征模型组成的元分类器,名为 PredIL13,同时考虑了模型的准确性。PredIL13 大大优于最先进的预测器,因此是加速在人类基因组中检测 IL13 诱导肽的宝贵工具。