Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, 10400, Thailand.
Sci Rep. 2021 Feb 4;11(1):3017. doi: 10.1038/s41598-021-82513-9.
As anticancer peptides (ACPs) have attracted great interest for cancer treatment, several approaches based on machine learning have been proposed for ACP identification. Although existing methods have afforded high prediction accuracies, however such models are using a large number of descriptors together with complex ensemble approaches that consequently leads to low interpretability and thus poses a challenge for biologists and biochemists. Therefore, it is desirable to develop a simple, interpretable and efficient predictor for accurate ACP identification as well as providing the means for the rational design of new anticancer peptides with promising potential for clinical application. Herein, we propose a novel flexible scoring card method (FSCM) making use of propensity scores of local and global sequential information for the development of a sequence-based ACP predictor (named iACP-FSCM) for improving the prediction accuracy and model interpretability. To the best of our knowledge, iACP-FSCM represents the first sequence-based ACP predictor for rationalizing an in-depth understanding into the molecular basis for the enhancement of anticancer activities of peptides via the use of FSCM-derived propensity scores. The independent testing results showed that the iACP-FSCM provided accuracies of 0.825 and 0.910 as evaluated on the main and alternative datasets, respectively. Results from comparative benchmarking demonstrated that iACP-FSCM could outperform seven other existing ACP predictors with marked improvements of 7% and 17% for accuracy and MCC, respectively, on the main dataset. Furthermore, the iACP-FSCM (0.910) achieved very comparable results to that of the state-of-the-art ensemble model AntiCP2.0 (0.920) as evaluated on the alternative dataset. Comparative results demonstrated that iACP-FSCM was the most suitable choice for ACP identification and characterization considering its simplicity, interpretability and generalizability. It is highly anticipated that the iACP-FSCM may be a robust tool for the rapid screening and identification of promising ACPs for clinical use.
由于抗癌肽 (ACPs) 在癌症治疗方面引起了极大的兴趣,因此已经提出了几种基于机器学习的 ACP 识别方法。尽管现有的方法提供了较高的预测准确性,但是这些模型使用了大量的描述符和复杂的集成方法,因此导致可解释性较低,从而对生物学家和生物化学家提出了挑战。因此,开发一种简单、可解释和高效的预测器来准确识别 ACP,并为新的抗癌肽的合理设计提供有希望的临床应用潜力是非常可取的。在这里,我们提出了一种新的灵活评分卡方法 (FSCM),利用局部和全局序列信息的倾向得分来开发基于序列的 ACP 预测器 (命名为 iACP-FSCM),以提高预测准确性和模型可解释性。据我们所知,iACP-FSCM 代表了第一个基于序列的 ACP 预测器,用于通过使用 FSCM 衍生的倾向得分深入了解增强肽抗癌活性的分子基础。独立测试结果表明,iACP-FSCM 在主数据集和替代数据集上的准确率分别为 0.825 和 0.910。与其他七个现有 ACP 预测器的比较基准结果表明,iACP-FSCM 在主数据集上的准确率和 MCC 分别提高了 7%和 17%,表现优于其他预测器。此外,iACP-FSCM(0.910)在替代数据集上的评估结果与最先进的集成模型 AntiCP2.0(0.920)非常接近。比较结果表明,考虑到 iACP-FSCM 的简单性、可解释性和通用性,它是 ACP 识别和表征的最佳选择。预计 iACP-FSCM 将成为快速筛选和识别有临床应用前景的 ACP 的有力工具。