Harun-Or-Roshid Md, Kurata Hiroyuki
Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
Sci Rep. 2025 Jul 1;15(1):21213. doi: 10.1038/s41598-025-05491-2.
Interleukin-6 (IL-6) is a cytokine with diverse biological activities that contribute to a variety of physiologic and immune responses. IL-6-inducing peptides are the short protein fragments that are critical for playing a contributing role in biological processes. Extensive research has advanced the development of IL-6-inducing peptides, but identifying these peptides experimentally remains time-consuming, labor-intensive, and costly. Therefore, computational prediction has gained attention as an alternative method. Meanwhile, some computational methods have already been developed, but they suffer from insufficient accuracy and inadequate feature engineering. In this study, we developed PredIL6, an advanced ensemble learning model that precisely identifies IL-6-inducing peptides by combining probability scores from 148 baseline machine learning and deep learning models, using a genetic algorithm-based meta-classifier. A forward feature selection method was used to construct the ensemble model, which consists of 20 baseline or single-feature models, including AAINDEX, BLOSUM62, and language models (ESM-2 and word2vec). PredIL6 outperformed existing state-of-the-art methods, achieving accuracy values of 0.934 and 0.899 on the training and test sets, respectively. Thus, PredIL6 is a powerful tool for expediting the identification of IL-6-inducing peptides. A freely available web application and a standalone PredIL6 program are provided.
白细胞介素-6(IL-6)是一种具有多种生物活性的细胞因子,可促成多种生理和免疫反应。IL-6诱导肽是短蛋白质片段,在生物过程中发挥作用至关重要。广泛的研究推动了IL-6诱导肽的发展,但通过实验鉴定这些肽仍然耗时、费力且成本高昂。因此,计算预测作为一种替代方法受到了关注。同时,已经开发了一些计算方法,但它们存在准确性不足和特征工程不够的问题。在本研究中,我们开发了PredIL6,这是一种先进的集成学习模型,通过使用基于遗传算法的元分类器,结合148个基线机器学习和深度学习模型的概率分数,精确识别IL-6诱导肽。采用前向特征选择方法构建集成模型,该模型由20个基线或单特征模型组成,包括氨基酸指数(AAINDEX)、布罗莫尔矩阵62(BLOSUM62)和语言模型(ESM-2和词向量(word2vec))。PredIL6优于现有的最先进方法,在训练集和测试集上的准确率分别达到0.934和0.899。因此,PredIL6是加速鉴定IL-6诱导肽的强大工具。我们提供了一个免费的网络应用程序和一个独立的PredIL6程序。