Department of Physiology, Ajou University School of Medicine, Suwon, South Korea.
Institute of Molecular Science and Technology, Ajou University, Suwon, South Korea.
Front Immunol. 2018 Jul 31;9:1783. doi: 10.3389/fimmu.2018.01783. eCollection 2018.
Proinflammatory cytokines have the capacity to increase inflammatory reaction and play a central role in first line of defence against invading pathogens. Proinflammatory inducing peptides (PIPs) have been used as an antineoplastic agent, an antibacterial agent and a vaccine in immunization therapies. Due to the advancement in sequence technologies that resulted an avalanche of protein sequence data. Therefore, it is necessary to develop an automated computational method to enable fast and accurate identification of novel PIPs within the vast number of candidate proteins and peptides. To address this, we proposed a new predictor, PIP-EL, for predicting PIPs using the strategy of ensemble learning (EL). Our benchmarking dataset is imbalanced. Thus, we applied a random under-sampling technique to generate 10 balanced models for each composition. Technically, PIP-EL is the fusion of 50 independent random forest (RF) models, where each of the five different compositions, including amino acid, dipeptide, composition-transition-distribution, physicochemical properties, and amino acid index contains 10 RF models. PIP-EL achieves the Matthews' correlation coefficient (MCC) of 0.435 in a 5-fold cross-validation test, which is ~2-5% higher than that of the individual classifiers and hybrid feature-based classifier. Furthermore, we evaluate the performance of PIP-EL on the independent dataset, showing that our method outperforms the existing method and two different machine learning methods developed in this study, with an MCC of 0.454. These results indicate that PIP-EL will be a useful tool for predicting PIPs and for researchers working in the field of peptide therapeutics and immunotherapy. The user-friendly web server, PIP-EL, is freely accessible.
促炎细胞因子具有增强炎症反应的能力,在抵御入侵病原体的第一道防线中发挥核心作用。促炎诱导肽 (PIP) 已被用作抗肿瘤剂、抗菌剂和免疫疗法中的疫苗。由于序列技术的进步,导致了大量蛋白质序列数据的涌现。因此,有必要开发一种自动化的计算方法,以便在大量候选蛋白质和肽中快速准确地识别新的 PIP。为了解决这个问题,我们提出了一种新的预测器 PIP-EL,用于使用集成学习 (EL) 策略预测 PIP。我们的基准数据集是不平衡的。因此,我们应用随机欠采样技术为每个组成生成 10 个平衡模型。从技术上讲,PIP-EL 是 50 个独立随机森林 (RF) 模型的融合,其中包括氨基酸、二肽、组成-转换-分布、理化性质和氨基酸指数在内的五个不同组成中的每一个都包含 10 个 RF 模型。PIP-EL 在 5 折交叉验证测试中实现了马修斯相关系数 (MCC) 为 0.435,比单个分类器和基于混合特征的分类器高 2-5%。此外,我们在独立数据集上评估了 PIP-EL 的性能,结果表明我们的方法优于现有的方法和本研究中开发的两种不同的机器学习方法,MCC 为 0.454。这些结果表明,PIP-EL 将成为预测 PIP 以及肽治疗和免疫治疗领域研究人员的有用工具。用户友好的网络服务器 PIP-EL 可免费访问。