Deng Hua, Lou Chaofeng, Wu Zengrui, Li Weihua, Liu Guixia, Tang Yun
Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China.
iScience. 2022 Aug 17;25(9):104967. doi: 10.1016/j.isci.2022.104967. eCollection 2022 Sep 16.
Accurate and efficient identification of anti-inflammatory peptides (AIPs) is crucial for the treatment of inflammation. Here, we proposed a two-layer stacking ensemble model, AIPStack, to effectively predict AIPs. At first, we constructed a new dataset for model building and validation. Then, peptide sequences were represented by hybrid features, which were fused by two amino acid composition descriptors. Next, the stacking ensemble model was constructed by random forest and extremely randomized tree as the base-classifiers and logistic regression as the meta-classifier to receive the outputs from the base-classifiers. AIPStack achieved an AUC of 0.819, accuracy of 0.755, and MCC of 0.510 on the independent set 3, which were higher than other AIP predictors. Furthermore, the essential sequence features were highlighted by the Shapley Additive exPlanation (SHAP) method. It is anticipated that AIPStack could be used for AIP prediction in a high-throughput manner and facilitate the hypothesis-driven experimental design.
准确高效地识别抗炎肽(AIPs)对于炎症治疗至关重要。在此,我们提出了一种两层堆叠集成模型AIPStack,以有效预测AIPs。首先,我们构建了一个用于模型构建和验证的新数据集。然后,肽序列由混合特征表示,这些特征由两个氨基酸组成描述符融合而成。接下来,通过随机森林和极端随机树作为基分类器以及逻辑回归作为元分类器构建堆叠集成模型,以接收基分类器的输出。AIPStack在独立集3上实现了0.819的AUC、0.755的准确率和0.510的MCC,高于其他AIP预测器。此外,通过Shapley值加法解释(SHAP)方法突出了关键序列特征。预计AIPStack可用于高通量的AIP预测,并促进假设驱动的实验设计。