Nasiri Farid, Atanaki Fereshteh Fallah, Behrouzi Saman, Kavousi Kaveh, Bagheri Mojtaba
Peptide Chemistry Laboratory, Department of Biochemistry, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran 14176-14335, Iran.
Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran 14176-14411, Iran.
ACS Omega. 2021 Jul 25;6(30):19846-19859. doi: 10.1021/acsomega.1c02569. eCollection 2021 Aug 3.
Cell-penetrating anticancer peptides (Cp-ACPs) are considered promising candidates in solid tumor and hematologic cancer therapies. Current approaches for the design and discovery of Cp-ACPs trust the expensive high-throughput screenings that often give rise to multiple obstacles, including instrumentation adaptation and experimental handling. The application of machine learning (ML) tools developed for peptide activity prediction is importantly of growing interest. In this study, we applied the random forest (RF)-, support vector machine (SVM)-, and eXtreme gradient boosting (XGBoost)-based algorithms to predict the active Cp-ACPs using an experimentally validated data set. The model, CpACpP, was developed on the basis of two independent cell-penetrating peptide (CPP) and anticancer peptide (ACP) subpredictors. Various compositional and physiochemical-based features were combined or selected using the multilayered recursive feature elimination (RFE) method for both data sets. Our results showed that the ACP subclassifiers obtain a mean performance accuracy (ACC) of 0.98 with an area under curve (AUC) ≈ 0.98 vis-à-vis the CPP predictors displaying relevant values of ∼0.94 and ∼0.95 via the hybrid-based features and independent data sets, respectively. Also, the predicting evaluation of Cp-ACPs gave accuracies of ∼0.79 and 0.89 on a series of independent sequences by applying our CPP and ACP classifiers, respectively, which leaves the performance of our predictors better than the earlier reported ACPred, mACPpred, MLCPP, and CPPred-RF. The described consensus-based fusion method additionally reached an AUC of 0.94 for the prediction of Cp-ACP (http://cbb1.ut.ac.ir/CpACpP/Index).
细胞穿透性抗癌肽(Cp - ACPs)被认为是实体瘤和血液系统癌症治疗中很有前景的候选物。目前设计和发现Cp - ACPs的方法依赖于昂贵的高通量筛选,这常常带来多种障碍,包括仪器适配和实验操作。为肽活性预测而开发的机器学习(ML)工具的应用正变得越来越受关注。在本研究中,我们应用基于随机森林(RF)、支持向量机(SVM)和极端梯度提升(XGBoost)的算法,使用经过实验验证的数据集来预测活性Cp - ACPs。模型CpACpP是基于两个独立的细胞穿透肽(CPP)和抗癌肽(ACP)子预测器开发的。对于这两个数据集,使用多层递归特征消除(RFE)方法组合或选择了各种基于组成和理化性质的特征。我们的结果表明,相对于分别通过基于混合的特征和独立数据集显示约0.94和0.95相关值的CPP预测器,ACP子分类器获得了平均性能准确率(ACC)为0.98,曲线下面积(AUC)≈0.98。此外,通过应用我们的CPP和ACP分类器,对一系列独立序列的Cp - ACPs预测评估分别给出了约0.79和0.89的准确率,这使得我们预测器的性能优于早期报道的ACPred、mACPpred、MLCPP和CPPred - RF。所描述的基于共识的融合方法在预测Cp - ACP时还达到了AUC为0.94(http://cbb1.ut.ac.ir/CpACpP/Index)。