College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China.
BMC Bioinformatics. 2021 Oct 20;22(1):512. doi: 10.1186/s12859-021-04433-9.
Anticancer peptides are defence substances with innate immune functions that can selectively act on cancer cells without harming normal cells and many studies have been conducted to identify anticancer peptides. In this paper, we introduce the anticancer peptide secondary structures as additional features and propose an effective computational model, CL-ACP, that uses a combined network and attention mechanism to predict anticancer peptides.
The CL-ACP model uses secondary structures and original sequences of anticancer peptides to construct the feature space. The long short-term memory and convolutional neural network are used to extract the contextual dependence and local correlations of the feature space. Furthermore, a multi-head self-attention mechanism is used to strengthen the anticancer peptide sequences. Finally, three categories of feature information are classified by cascading. CL-ACP was validated using two types of datasets, anticancer peptide datasets and antimicrobial peptide datasets, on which it achieved good results compared to previous methods. CL-ACP achieved the highest AUC values of 0.935 and 0.972 on the anticancer peptide and antimicrobial peptide datasets, respectively.
CL-ACP can effectively recognize antimicrobial peptides, especially anticancer peptides, and the parallel combined neural network structure of CL-ACP does not require complex feature design and high time cost. It is suitable for application as a useful tool in antimicrobial peptide design.
抗癌肽具有先天免疫功能的防御物质,能够选择性地作用于癌细胞而不伤害正常细胞,因此许多研究都致力于鉴定抗癌肽。本文引入抗癌肽的二级结构作为附加特征,并提出了一种有效的计算模型 CL-ACP,该模型使用联合网络和注意力机制来预测抗癌肽。
CL-ACP 模型使用二级结构和抗癌肽的原始序列构建特征空间。长短期记忆和卷积神经网络用于提取特征空间的上下文依赖性和局部相关性。此外,使用多头自注意力机制来增强抗癌肽序列。最后,通过级联对三类特征信息进行分类。CL-ACP 在两种数据集——抗癌肽数据集和抗菌肽数据集上进行了验证,与之前的方法相比,它取得了很好的效果。CL-ACP 在抗癌肽和抗菌肽数据集上分别取得了最高 AUC 值 0.935 和 0.972。
CL-ACP 可以有效地识别抗菌肽,尤其是抗癌肽,并且 CL-ACP 的并行联合神经网络结构不需要复杂的特征设计和高时间成本。它适合作为抗菌肽设计的有用工具进行应用。