Suppr超能文献

EACVP:一种结合卷积神经网络(CNN)和CBAM注意力机制的ESM-2语言模型框架,用于预测抗冠状病毒肽。

EACVP: An ESM-2 LM Framework Combined CNN and CBAM Attention to Predict Anti-coronavirus Peptides.

作者信息

Zhang Shengli, Jing Yuanyuan, Liang Yunyun

机构信息

School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P.R. China.

Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, P.R. China.

出版信息

Curr Med Chem. 2025;32(10):2040-2054. doi: 10.2174/0109298673287899240303164403.

Abstract

BACKGROUND

The novel coronavirus pneumonia (COVID-19) outbreak in late 2019 killed millions worldwide. Coronaviruses cause diseases such as severe acute respiratory syndrome (SARS-CoV) and SARS-CoV-2. Many peptides in the host defense system have antiviral activity. How to establish a set of efficient models to identify anti-coronavirus peptides is a meaningful study.

METHODS

Given this, a new prediction model EACVP is proposed. This model uses the evolutionary scale language model (ESM-2 LM) to characterize peptide sequence information. The ESM model is a natural language processing model trained by machine learning technology. It is trained on a highly diverse and dense dataset (UR50/D 2021_04) and uses the pre-trained language model to obtain peptide sequence features with 320 dimensions. Compared with traditional feature extraction methods, the information represented by ESM-2 LM is more comprehensive and stable. Then, the features are input into the convolutional neural network (CNN), and the convolutional block attention module (CBAM) lightweight attention module is used to perform attention operations on CNN in space dimension and channel dimension. To verify the rationality of the model structure, we performed ablation experiments on the benchmark and independent test datasets. We compared the EACVP with existing methods on the independent test dataset.

RESULTS

Experimental results show that ACC, F1-score, and MCC are 3.95%, 35.65% and 0.0725 higher than the most advanced methods, respectively. At the same time, we tested EACVP on ENNAVIA-C and ENNAVIA-D data sets, and the results showed that EACVP has good migration and is a powerful tool for predicting anti-coronavirus peptides.

CONCLUSION

The results prove that this model EACVP could fully characterize the peptide information and achieve high prediction accuracy. It can be generalized to different data sets. The data and code of the article have been uploaded to https://github.- com/JYY625/EACVP.git.

摘要

背景

2019年末爆发的新型冠状病毒肺炎(COVID-19)在全球造成数百万人死亡。冠状病毒可引发严重急性呼吸综合征(SARS-CoV)和SARS-CoV-2等疾病。宿主防御系统中的许多肽具有抗病毒活性。如何建立一套高效的模型来识别抗冠状病毒肽是一项有意义的研究。

方法

鉴于此,提出了一种新的预测模型EACVP。该模型使用进化尺度语言模型(ESM-2 LM)来表征肽序列信息。ESM模型是一种通过机器学习技术训练的自然语言处理模型。它在一个高度多样化和密集的数据集(UR50/D 2021_04)上进行训练,并使用预训练语言模型来获得具有320维的肽序列特征。与传统特征提取方法相比,ESM-2 LM所代表的信息更全面、更稳定。然后,将这些特征输入到卷积神经网络(CNN)中,并使用卷积块注意力模块(CBAM)轻量级注意力模块在空间维度和通道维度上对CNN进行注意力操作。为了验证模型结构的合理性,我们在基准和独立测试数据集上进行了消融实验。我们在独立测试数据集上将EACVP与现有方法进行了比较。

结果

实验结果表明,ACC、F1分数和MCC分别比最先进的方法高3.95%、35.65%和0.0725。同时,我们在ENNAVIA-C和ENNAVIA-D数据集上对EACVP进行了测试,结果表明EACVP具有良好的迁移性,是预测抗冠状病毒肽的有力工具。

结论

结果证明,该模型EACVP能够充分表征肽信息并实现高预测准确率。它可以推广到不同的数据集。本文的数据和代码已上传至https://github.- com/JYY625/EACVP.git。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验