HemoFuse：基于多头交叉注意力的多特征融合用于识别溶血肽。

HemoFuse: multi-feature fusion based on multi-head cross-attention for identification of hemolytic peptides.

机构信息

School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P. R. China.

School of Science, Xi'an Polytechnic University, Xi'an, 710048, P. R. China.

出版信息

Sci Rep. 2024 Sep 28;14(1):22518. doi: 10.1038/s41598-024-74326-3.

DOI:10.1038/s41598-024-74326-3

PMID:39342017

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11438874/

Abstract

Hemolytic peptides are therapeutic peptides that damage red blood cells. However, therapeutic peptides used in medical treatment must exhibit low toxicity to red blood cells to achieve the desired therapeutic effect. Therefore, accurate prediction of the hemolytic activity of therapeutic peptides is essential for the development of peptide therapies. In this study, a multi-feature cross-fusion model, HemoFuse, for hemolytic peptide identification is proposed. The feature vectors of peptide sequences are transformed by word embedding technique and four hand-crafted feature extraction methods. We apply multi-head cross-attention mechanism to hemolytic peptide identification for the first time. It captures the interaction between word embedding features and hand-crafted features by calculating the attention of all positions in them, so that multiple features can be deeply fused. Moreover, we visualize the features obtained by this module to enhance its interpretability. On the comprehensive integrated dataset, HemoFuse achieves ideal results, with ACC, SP, SN, MCC, F1, AUC, and AP of 0.7575, 0.8814, 0.5793, 0.4909, 0.6620, 0.8387, and 0.7118, respectively. Compared with HemoDL proposed by Yang et al., it is 3.32%, 3.89%, 5.93%, 10.6%, 8.17%, 5.88%, and 2.72% higher. Other ablation experiments also prove that our model is reasonable and efficient. The codes and datasets are accessible at https://github.com/z11code/Hemo .

摘要

溶血肽是破坏红细胞的治疗性肽。然而，用于医疗治疗的治疗性肽必须对红细胞表现出低毒性，才能达到预期的治疗效果。因此，准确预测治疗性肽的溶血活性对于肽疗法的发展至关重要。在这项研究中，提出了一种用于溶血肽识别的多特征交叉融合模型 HemoFuse。肽序列的特征向量通过词嵌入技术和四种手工特征提取方法进行转换。我们首次将多头交叉注意力机制应用于溶血肽识别。它通过计算它们所有位置的注意力来捕捉词嵌入特征和手工特征之间的相互作用，从而可以对多个特征进行深度融合。此外，我们对该模块获得的特征进行可视化，以增强其可解释性。在综合集成数据集上，HemoFuse 取得了理想的结果，ACC、SP、SN、MCC、F1、AUC 和 AP 分别为 0.7575、0.8814、0.5793、0.4909、0.6620、0.8387 和 0.7118。与 Yang 等人提出的 HemoDL 相比，分别高出 3.32%、3.89%、5.93%、10.6%、8.17%、5.88%和 2.72%。其他消融实验也证明了我们的模型是合理和高效的。代码和数据集可在 https://github.com/z11code/Hemo 上获得。