Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.
School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK.
Int J Mol Sci. 2023 Oct 22;24(20):15447. doi: 10.3390/ijms242015447.
Anticancer peptides (ACPs) have been proven to possess potent anticancer activities. Although computational methods have emerged for rapid ACPs identification, their accuracy still needs improvement. In this study, we propose a model called ACP-BC, a three-channel end-to-end model that utilizes various combinations of data augmentation techniques. In the first channel, features are extracted from the raw sequence using a bidirectional long short-term memory network. In the second channel, the entire sequence is converted into a chemical molecular formula, which is further simplified using Simplified Molecular Input Line Entry System notation to obtain deep abstract features through a bidirectional encoder representation transformer (BERT). In the third channel, we manually selected four effective features according to dipeptide composition, binary profile feature, k-mer sparse matrix, and pseudo amino acid composition. Notably, the application of chemical BERT in predicting ACPs is novel and successfully integrated into our model. To validate the performance of our model, we selected two benchmark datasets, ACPs740 and ACPs240. ACP-BC achieved prediction accuracy with 87% and 90% on these two datasets, respectively, representing improvements of 1.3% and 7% compared to existing state-of-the-art methods on these datasets. Therefore, systematic comparative experiments have shown that the ACP-BC can effectively identify anticancer peptides.
抗癌肽 (ACPs) 已被证明具有很强的抗癌活性。虽然已经出现了用于快速鉴定 ACPs 的计算方法,但它们的准确性仍有待提高。在这项研究中,我们提出了一种名为 ACP-BC 的模型,这是一种三通道端到端模型,利用了各种数据增强技术的组合。在第一个通道中,使用双向长短期记忆网络从原始序列中提取特征。在第二个通道中,将整个序列转换为化学分子公式,然后使用简化分子输入行进入系统符号进一步简化,通过双向编码器表示转换器 (BERT) 获得深层抽象特征。在第三个通道中,我们根据二肽组成、二进制轮廓特征、k-mer 稀疏矩阵和伪氨基酸组成手动选择了四个有效特征。值得注意的是,化学 BERT 在预测 ACPs 中的应用是新颖的,并成功地集成到我们的模型中。为了验证我们模型的性能,我们选择了两个基准数据集 ACPs740 和 ACPs240。ACP-BC 在这两个数据集上的预测准确率分别为 87%和 90%,与这两个数据集上现有的最先进方法相比,分别提高了 1.3%和 7%。因此,系统比较实验表明,ACP-BC 可以有效地识别抗癌肽。