Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China.
School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China.
Int J Mol Sci. 2023 Feb 21;24(5):4328. doi: 10.3390/ijms24054328.
Cancer is one of the leading diseases threatening human life and health worldwide. Peptide-based therapies have attracted much attention in recent years. Therefore, the precise prediction of anticancer peptides (ACPs) is crucial for discovering and designing novel cancer treatments. In this study, we proposed a novel machine learning framework (GRDF) that incorporates deep graphical representation and deep forest architecture for identifying ACPs. Specifically, GRDF extracts graphical features based on the physicochemical properties of peptides and integrates their evolutionary information along with binary profiles for constructing models. Moreover, we employ the deep forest algorithm, which adopts a layer-by-layer cascade architecture similar to deep neural networks, enabling excellent performance on small datasets but without complicated tuning of hyperparameters. The experiment shows GRDF exhibits state-of-the-art performance on two elaborate datasets (Set 1 and Set 2), achieving 77.12% accuracy and 77.54% F1-score on Set 1, as well as 94.10% accuracy and 94.15% F1-score on Set 2, exceeding existing ACP prediction methods. Our models exhibit greater robustness than the baseline algorithms commonly used for other sequence analysis tasks. In addition, GRDF is well-interpretable, enabling researchers to better understand the features of peptide sequences. The promising results demonstrate that GRDF is remarkably effective in identifying ACPs. Therefore, the framework presented in this study could assist researchers in facilitating the discovery of anticancer peptides and contribute to developing novel cancer treatments.
癌症是威胁全球人类生命和健康的主要疾病之一。近年来,基于肽的疗法引起了广泛关注。因此,精确预测抗癌肽(ACPs)对于发现和设计新型癌症治疗方法至关重要。在这项研究中,我们提出了一种新的机器学习框架(GRDF),该框架结合了深度图形表示和深度森林架构,用于识别 ACPs。具体来说,GRDF 基于肽的物理化学性质提取图形特征,并整合其进化信息以及二进制谱图构建模型。此外,我们采用了深度森林算法,该算法采用类似于深度神经网络的逐层级联架构,在处理小数据集时具有出色的性能,而无需复杂的超参数调整。实验表明,GRDF 在两个精心设计的数据集(Set1 和 Set2)上表现出最先进的性能,在 Set1 上达到 77.12%的准确率和 77.54%的 F1 得分,在 Set2 上达到 94.10%的准确率和 94.15%的 F1 得分,超过了现有的 ACP 预测方法。我们的模型比常用于其他序列分析任务的基线算法更具鲁棒性。此外,GRDF 具有很好的可解释性,使研究人员能够更好地理解肽序列的特征。有前途的结果表明,GRDF 在识别 ACPs 方面非常有效。因此,本研究提出的框架可以帮助研究人员促进抗癌肽的发现,并有助于开发新型癌症治疗方法。