Manavalan Balachandran, Shin Tae H, Kim Myeong O, Lee Gwang
Department of Physiology, Ajou University School of Medicine, Suwon, South Korea.
Institute of Molecular Science and Technology, Ajou University, Suwon, South Korea.
Front Pharmacol. 2018 Mar 27;9:276. doi: 10.3389/fphar.2018.00276. eCollection 2018.
The use of therapeutic peptides in various inflammatory diseases and autoimmune disorders has received considerable attention; however, the identification of anti-inflammatory peptides (AIPs) through wet-lab experimentation is expensive and often time consuming. Therefore, the development of novel computational methods is needed to identify potential AIP candidates prior to experimentation. In this study, we proposed a random forest (RF)-based method for predicting AIPs, called AIPpred (AIP predictor in primary amino acid sequences), which was trained with 354 optimal features. First, we systematically studied the contribution of individual composition [amino acid-, dipeptide composition (DPC), amino acid index, chain-transition-distribution, and physicochemical properties] in AIP prediction. Since the performance of the DPC-based model is significantly better than that of other composition-based models, we applied a feature selection protocol on this model and identified the optimal features. AIPpred achieved an area under the curve (AUC) value of 0.801 in a 5-fold cross-validation test, which was ∼2% higher than that of the control RF predictor trained with all DPC composition features, indicating the efficiency of the feature selection protocol. Furthermore, we evaluated the performance of AIPpred on an independent dataset, with results showing that our method outperformed an existing method, as well as 3 different machine learning methods developed in this study, with an AUC value of 0.814. These results indicated that AIPpred will be a useful tool for predicting AIPs and might efficiently assist the development of AIP therapeutics and biomedical research. AIPpred is freely accessible at www.thegleelab.org/AIPpred.
治疗性肽在各种炎症性疾病和自身免疫性疾病中的应用已受到广泛关注;然而,通过湿实验室实验鉴定抗炎肽(AIP)成本高昂且耗时。因此,需要开发新的计算方法,以便在实验之前识别潜在的AIP候选物。在本研究中,我们提出了一种基于随机森林(RF)的AIP预测方法,称为AIPpred(基于一级氨基酸序列的AIP预测器),该方法使用354个最佳特征进行训练。首先,我们系统地研究了个体组成[氨基酸、二肽组成(DPC)、氨基酸指数、链转移分布和物理化学性质]在AIP预测中的作用。由于基于DPC的模型性能明显优于其他基于组成的模型,我们对该模型应用了特征选择协议并确定了最佳特征。在5折交叉验证测试中,AIPpred的曲线下面积(AUC)值为0.801,比使用所有DPC组成特征训练的对照RF预测器高约2%,表明特征选择协议的有效性。此外,我们在一个独立数据集上评估了AIPpred的性能,结果表明我们的方法优于现有方法以及本研究中开发的3种不同机器学习方法(AUC值为0.814)。这些结果表明,AIPpred将成为预测AIP的有用工具,并可能有效地辅助AIP治疗药物的开发和生物医学研究。可在www.thegleelab.org/AIPpred免费获取AIPpred。