pACP-HybDeep：基于二叉树生长的变压器和深度混合学习的结构特征编码预测抗癌肽

pACP-HybDeep: predicting anticancer peptides using binary tree growth based transformer and structural feature encoding with deep-hybrid learning.

作者信息

Hayat Maqsood, Alghamdi Wajdi, Akbar Shahid, Raza Ali, Kadir Rabiah Abdul, Sarker Mahidur R

机构信息

Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan.

Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.

出版信息

Sci Rep. 2025 Jan 2;15(1):565. doi: 10.1038/s41598-024-84146-0.

Worldwide, Cancer remains a significant health concern due to its high mortality rates. Despite numerous traditional therapies and wet-laboratory methods for treating cancer-affected cells, these approaches often face limitations, including high costs and substantial side effects. Recently the high selectivity of peptides has garnered significant attention from scientists due to their reliable targeted actions and minimal adverse effects. Furthermore, keeping the significant outcomes of the existing computational models, we propose a highly reliable and effective model namely, pACP-HybDeep for the accurate prediction of anticancer peptides. In this model, training peptides are numerically encoded using an attention-based ProtBERT-BFD encoder to extract semantic features along with CTDT-based structural information. Furthermore, a k-nearest neighbor-based binary tree growth (BTG) algorithm is employed to select an optimal feature set from the multi-perspective vector. The selected feature vector is subsequently trained using a CNN + RNN-based deep learning model. Our proposed pACP-HybDeep model demonstrated a high training accuracy of 95.33%, and an AUC of 0.97. To validate the generalization capabilities of the model, our pACP-HybDeep model achieved accuracies of 94.92%, 92.26%, and 91.16% on independent datasets Ind-S1, Ind-S2, and Ind-S3, respectively. The demonstrated efficacy, and reliability of the pACP-HybDeep model using test datasets establish it as a valuable tool for researchers in academia and pharmaceutical drug design.

在全球范围内，癌症因其高死亡率仍然是一个重大的健康问题。尽管有许多传统疗法和用于治疗癌细胞的湿实验室方法，但这些方法往往面临局限性，包括高成本和严重的副作用。最近，肽的高选择性因其可靠的靶向作用和最小的副作用而受到科学家的广泛关注。此外，考虑到现有计算模型的显著成果，我们提出了一种高度可靠且有效的模型，即pACP-HybDeep，用于准确预测抗癌肽。在该模型中，训练肽使用基于注意力的ProtBERT-BFD编码器进行数值编码，以提取语义特征以及基于CTDT的结构信息。此外，采用基于k近邻的二叉树生长（BTG）算法从多视角向量中选择最优特征集。随后，使用基于CNN + RNN的深度学习模型对所选特征向量进行训练。我们提出的pACP-HybDeep模型展示了95.33%的高训练准确率和0.97的AUC。为了验证该模型的泛化能力，我们的pACP-HybDeep模型在独立数据集Ind-S1、Ind-S2和Ind-S3上分别达到了94.92%、92.26%和91.16%的准确率。使用测试数据集所展示的pACP-HybDeep模型的有效性和可靠性使其成为学术界研究人员和药物设计的宝贵工具。