Zhang Bangyi, Zuo Yun, Wan Jun, Liu Jiayue, Liu Xiangrong, Zeng Xiangxiang, Deng Zhaohong
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China.
Department of Computer Science and Technology, National Institute for Data Science in Health and Medicine, Xiamen Key Laboratory of Intelligent Storage and Computing, Xiamen University, Xiamen, China.
PLoS Comput Biol. 2025 Sep 11;21(9):e1013489. doi: 10.1371/journal.pcbi.1013489. eCollection 2025 Sep.
Cancer remains a major contributor to global mortality, constituting a significant and escalating threat to human health. Anticancer peptides (ACPs) have emerged as promising therapeutic agents due to their specific mechanisms of action, pronounced tumor-targeting capability, and low toxicity. Nevertheless, traditional approaches for ACP identification are constrained by their reliance on shallow, hand-crafted sequence features, which fail to capture deeper semantic and structural characteristics. Moreover, such models exhibit limited robustness and interpretability when confronted with practical challenges such as severe class imbalance. To address these limitations, this study proposes HyperACP, an innovative framework for ACP recognition that integrates deep representation learning, adaptive sampling, and mechanistic interpretability. The framework leverages the ESMC protein language model to extract comprehensive sequence features and employs a novel adaptive algorithm, ANBS, to mitigate class imbalance at the decision boundary. For enhanced model transparency, SHAP-Res is incorporated to elucidate the contributions of individual residues to the final predictions. Comprehensive evaluations demonstrate that HyperACP consistently outperforms state-of-the-art methods across multiple datasets and validation protocols-including 10-fold cross-validation and independent test sets-according to metrics such as Accuracy (ACC), Sensitivity (SN), Specificity (SP), Matthews Correlation Coefficient (MCC), and Area Under the Curve (AUC). Furthermore, the model yields biologically interpretable results, pinpointing key residues (K, L, F, G) known to play pivotal roles in anticancer activity. These findings provide not only a robust predictive tool (available at www.hyperacp.com) but also novel insights into the structure-function relationships underlying ACPs.
癌症仍然是全球死亡率的主要贡献因素,对人类健康构成重大且不断升级的威胁。抗癌肽(ACPs)因其特定的作用机制、显著的肿瘤靶向能力和低毒性,已成为有前景的治疗药物。然而,传统的ACPs识别方法受到其对浅层手工制作序列特征的依赖的限制,这些特征无法捕捉更深层次的语义和结构特征。此外,当面对严重的类别不平衡等实际挑战时,此类模型表现出有限的稳健性和可解释性。为了解决这些限制,本研究提出了HyperACP,这是一种用于ACPs识别的创新框架,它集成了深度表示学习、自适应采样和机制可解释性。该框架利用ESMC蛋白质语言模型提取全面的序列特征,并采用一种新颖的自适应算法ANBS来减轻决策边界处的类别不平衡。为了提高模型的透明度,引入了SHAP-Res来阐明单个残基对最终预测的贡献。综合评估表明,根据准确率(ACC)、灵敏度(SN)、特异性(SP)、马修斯相关系数(MCC)和曲线下面积(AUC)等指标,HyperACP在多个数据集和验证协议(包括10折交叉验证和独立测试集)上始终优于现有方法。此外,该模型产生了具有生物学可解释性的结果,确定了已知在抗癌活性中起关键作用的关键残基(K、L、F、G)。这些发现不仅提供了一个强大的预测工具(可在www.hyperacp.com上获取),还为ACPs潜在的结构-功能关系提供了新的见解。