HyperACP：一种通过可扩展特征提取和基于自适应邻域的合成进行抗癌肽分类的前沿混合框架。

HyperACP: A cutting-edge hybrid framework for anticancer peptide classification via scalable feature extraction and adaptive neighbor-based synthesis.

作者信息

Zhang Bangyi, Zuo Yun, Wan Jun, Liu Jiayue, Liu Xiangrong, Zeng Xiangxiang, Deng Zhaohong

机构信息

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China.

Department of Computer Science and Technology, National Institute for Data Science in Health and Medicine, Xiamen Key Laboratory of Intelligent Storage and Computing, Xiamen University, Xiamen, China.

出版信息

PLoS Comput Biol. 2025 Sep 11;21(9):e1013489. doi: 10.1371/journal.pcbi.1013489. eCollection 2025 Sep.

DOI:10.1371/journal.pcbi.1013489

PMID:40934273

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12443255/

Abstract

Cancer remains a major contributor to global mortality, constituting a significant and escalating threat to human health. Anticancer peptides (ACPs) have emerged as promising therapeutic agents due to their specific mechanisms of action, pronounced tumor-targeting capability, and low toxicity. Nevertheless, traditional approaches for ACP identification are constrained by their reliance on shallow, hand-crafted sequence features, which fail to capture deeper semantic and structural characteristics. Moreover, such models exhibit limited robustness and interpretability when confronted with practical challenges such as severe class imbalance. To address these limitations, this study proposes HyperACP, an innovative framework for ACP recognition that integrates deep representation learning, adaptive sampling, and mechanistic interpretability. The framework leverages the ESMC protein language model to extract comprehensive sequence features and employs a novel adaptive algorithm, ANBS, to mitigate class imbalance at the decision boundary. For enhanced model transparency, SHAP-Res is incorporated to elucidate the contributions of individual residues to the final predictions. Comprehensive evaluations demonstrate that HyperACP consistently outperforms state-of-the-art methods across multiple datasets and validation protocols-including 10-fold cross-validation and independent test sets-according to metrics such as Accuracy (ACC), Sensitivity (SN), Specificity (SP), Matthews Correlation Coefficient (MCC), and Area Under the Curve (AUC). Furthermore, the model yields biologically interpretable results, pinpointing key residues (K, L, F, G) known to play pivotal roles in anticancer activity. These findings provide not only a robust predictive tool (available at www.hyperacp.com) but also novel insights into the structure-function relationships underlying ACPs.

摘要

癌症仍然是全球死亡率的主要贡献因素，对人类健康构成重大且不断升级的威胁。抗癌肽（ACPs）因其特定的作用机制、显著的肿瘤靶向能力和低毒性，已成为有前景的治疗药物。然而，传统的ACPs识别方法受到其对浅层手工制作序列特征的依赖的限制，这些特征无法捕捉更深层次的语义和结构特征。此外，当面对严重的类别不平衡等实际挑战时，此类模型表现出有限的稳健性和可解释性。为了解决这些限制，本研究提出了HyperACP，这是一种用于ACPs识别的创新框架，它集成了深度表示学习、自适应采样和机制可解释性。该框架利用ESMC蛋白质语言模型提取全面的序列特征，并采用一种新颖的自适应算法ANBS来减轻决策边界处的类别不平衡。为了提高模型的透明度，引入了SHAP-Res来阐明单个残基对最终预测的贡献。综合评估表明，根据准确率（ACC）、灵敏度（SN）、特异性（SP）、马修斯相关系数（MCC）和曲线下面积（AUC）等指标，HyperACP在多个数据集和验证协议（包括10折交叉验证和独立测试集）上始终优于现有方法。此外，该模型产生了具有生物学可解释性的结果，确定了已知在抗癌活性中起关键作用的关键残基（K、L、F、G）。这些发现不仅提供了一个强大的预测工具（可在www.hyperacp.com上获取），还为ACPs潜在的结构-功能关系提供了新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6be3/12443255/ffae2ba7c82e/pcbi.1013489.g001.jpg

相似文献

HyperACP: A cutting-edge hybrid framework for anticancer peptide classification via scalable feature extraction and adaptive neighbor-based synthesis.HyperACP：一种通过可扩展特征提取和基于自适应邻域的合成进行抗癌肽分类的前沿混合框架。

PLoS Comput Biol. 2025 Sep 11;21(9):e1013489. doi: 10.1371/journal.pcbi.1013489. eCollection 2025 Sep.

iACP-DPNet: a dual-pooling causal dilated convolutional network for interpretable anticancer peptide identification.iACP-DPNet：一种用于可解释抗癌肽识别的双池因果扩张卷积网络。

Funct Integr Genomics. 2025 Jul 4;25(1):147. doi: 10.1007/s10142-025-01641-x.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

CXR-MultiTaskNet a unified deep learning framework for joint disease localization and classification in chest radiographs.CXR-MultiTaskNet：一种用于胸部X光片中疾病联合定位与分类的统一深度学习框架。

Sci Rep. 2025 Aug 31;15(1):32022. doi: 10.1038/s41598-025-16669-z.

An ensemble strategy for piRNA identification through hybrid moment-based feature modeling.一种基于混合矩特征建模的piRNA识别集成策略。

Sci Rep. 2025 Aug 18;15(1):30157. doi: 10.1038/s41598-025-14194-7.

Short-Term Memory Impairment短期记忆障碍

Anti-Cancer Peptides Identification and Activity Type Classification With Protein Sequence Pre-Training.基于蛋白质序列预训练的抗癌肽鉴定与活性类型分类

IEEE J Biomed Health Inform. 2025 Mar;29(3):1692-1701. doi: 10.1109/JBHI.2024.3358632.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果：一种针对特定个体见解的新型验证方法。

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义

APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.

MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.马克 VCID 脑小血管联盟：一、入组、临床、液体方案。

Alzheimers Dement. 2021 Apr;17(4):704-715. doi: 10.1002/alz.12215. Epub 2021 Jan 21.

本文引用的文献

RepliChrom: Interpretable machine learning predicts cancer-associated enhancer-promoter interactions using DNA replication timing.RepliChrom：可解释的机器学习利用DNA复制时间预测癌症相关的增强子-启动子相互作用。

Imeta. 2025 May 27;4(4):e70052. doi: 10.1002/imt2.70052. eCollection 2025 Aug.

Advance in peptide-based drug development: delivery platforms, therapeutics and vaccines.基于肽的药物研发进展：递送平台、治疗药物与疫苗

Signal Transduct Target Ther. 2025 Mar 5;10(1):74. doi: 10.1038/s41392-024-02107-5.

Simulating 500 million years of evolution with a language model.用语言模型模拟5亿年的进化历程。

Science. 2025 Feb 21;387(6736):850-858. doi: 10.1126/science.ads0018. Epub 2025 Jan 16.

Accurate RNA velocity estimation based on multibatch network reveals complex lineage in batch scRNA-seq data.基于多批次网络的准确RNA速度估计揭示了批次单细胞RNA测序数据中的复杂谱系。

BMC Biol. 2024 Dec 18;22(1):290. doi: 10.1186/s12915-024-02085-8.

Trends of Artificial Intelligence (AI) Use in Drug Targets, Discovery and Development: Current Status and Future Perspectives.人工智能在药物靶点、发现与开发中的应用趋势：现状与未来展望

Curr Drug Targets. 2025;26(4):221-242. doi: 10.2174/0113894501322734241008163304.

mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations.mACPpred 2.0：具有集成空间和概率特征表示的用于抗癌肽预测的堆叠深度学习。

J Mol Biol. 2024 Sep 1;436(17):168687. doi: 10.1016/j.jmb.2024.168687. Epub 2024 Jun 25.

Identification of microbe-disease signed associations via multi-scale variational graph autoencoder based on signed message propagation.基于有向消息传播的多尺度变分图自动编码器识别微生物-疾病签名关联。

BMC Biol. 2024 Aug 15;22(1):172. doi: 10.1186/s12915-024-01968-0.

SeqKit2: A Swiss army knife for sequence and alignment processing.SeqKit2：一款用于序列和比对处理的瑞士军刀式工具。

Imeta. 2024 Apr 5;3(3):e191. doi: 10.1002/imt2.191. eCollection 2024 Jun.

ACP-ML: A sequence-based method for anticancer peptide prediction.ACP-ML：一种基于序列的抗癌肽预测方法。

Comput Biol Med. 2024 Mar;170:108063. doi: 10.1016/j.compbiomed.2024.108063. Epub 2024 Jan 28.

Deep-STP: a deep learning-based approach to predict snake toxin proteins by using word embeddings.深度序列到蛋白预测（Deep-STP）：一种基于深度学习的方法，通过词嵌入来预测蛇毒蛋白。

Front Med (Lausanne). 2024 Jan 17;10:1291352. doi: 10.3389/fmed.2023.1291352. eCollection 2023.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

HyperACP：一种通过可扩展特征提取和基于自适应邻域的合成进行抗癌肽分类的前沿混合框架。

HyperACP: A cutting-edge hybrid framework for anticancer peptide classification via scalable feature extraction and adaptive neighbor-based synthesis.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献