Suppr超能文献

TransC-ac4C:使用深度学习鉴定 mRNA 中的 N4-乙酰胞嘧啶(ac4C)位点。

TransC-ac4C: Identification of N4-Acetylcytidine (ac4C) Sites in mRNA Using Deep Learning.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1403-1412. doi: 10.1109/TCBB.2024.3386972. Epub 2024 Oct 9.

Abstract

N4-acetylcytidine (ac4C) is a post-transcriptional modification in mRNA that is critical in mRNA translation in terms of stability and regulation. In the past few years, numerous approaches employing convolutional neural networks (CNN) and Transformer have been proposed for the identification of ac4C sites, with each variety of approaches processing distinct characteristics. CNN-based methods excel at extracting local features and positional information, whereas Transformer-based ones stands out in establishing long-range dependencies and generating global representations. Given the importance of both local and global features in mRNA ac4C sites identification, we propose a novel method termed TransC-ac4C which combines CNN and Transformer together for enhancing the feature extraction capability and improving the identification accuracy. Five different feature encoding strategies (One-hot, NCP, ND, EIIP, and K-mer) are employed to generate the mRNA sequence representations, in which way the sequence attributes and physical and chemical properties of the sequences can be embedded. To strengthen the relevance of features, we construct a novel feature fusion method. Firstly, the CNN is employed to process five single features, stitch them together and feed them to the Transformer layer. Then, our approach employs CNN to extract local features and Transformer subsequently to establish global long-range dependencies among extracted features. We use 5-fold cross-validation to evaluate the model, and the evaluation indicators are significantly improved. The prediction accuracy of the two datasets is as high as 81.42% and 80.69%, respectively. It demonstrates the stronger competitiveness and generalization performance of our model.

摘要

N4-乙酰胞苷(ac4C)是 mRNA 中的一种转录后修饰,在 mRNA 翻译的稳定性和调控方面至关重要。在过去的几年中,已经提出了许多使用卷积神经网络(CNN)和 Transformer 的方法来识别 ac4C 位点,每种方法都处理不同的特征。基于 CNN 的方法擅长提取局部特征和位置信息,而基于 Transformer 的方法则擅长建立长程依赖关系并生成全局表示。鉴于在 mRNA ac4C 位点识别中局部和全局特征的重要性,我们提出了一种新的方法,称为 TransC-ac4C,它将 CNN 和 Transformer 结合在一起,以增强特征提取能力并提高识别准确性。我们采用了五种不同的特征编码策略(One-hot、NCP、ND、EIIP 和 K-mer)来生成 mRNA 序列表示,从而嵌入序列属性以及序列的物理和化学性质。为了增强特征的相关性,我们构建了一种新的特征融合方法。首先,我们使用 CNN 处理五个单一特征,将它们拼接在一起,并将其输入到 Transformer 层。然后,我们的方法使用 CNN 提取局部特征,随后使用 Transformer 建立提取特征之间的全局长程依赖关系。我们使用 5 折交叉验证来评估模型,评估指标得到了显著提高。两个数据集的预测准确率分别高达 81.42%和 80.69%,这表明我们的模型具有更强的竞争力和泛化性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验