Suppr超能文献

一种使用深度融合的改进残差网络,用于识别 RNA 5-甲基胞嘧啶位点。

An improved residual network using deep fusion for identifying RNA 5-methylcytosine sites.

机构信息

School of Mathematics and Statistics, Xidian University, Xi'an 710071, P. R. China.

出版信息

Bioinformatics. 2022 Sep 15;38(18):4271-4277. doi: 10.1093/bioinformatics/btac532.

Abstract

MOTIVATION

5-Methylcytosine (m5C) is a crucial post-transcriptional modification. With the development of technology, it is widely found in various RNAs. Numerous studies have indicated that m5C plays an essential role in various activities of organisms, such as tRNA recognition, stabilization of RNA structure, RNA metabolism and so on. Traditional identification is costly and time-consuming by wet biological experiments. Therefore, computational models are commonly used to identify the m5C sites. Due to the vast computing advantages of deep learning, it is feasible to construct the predictive model through deep learning algorithms.

RESULTS

In this study, we construct a model to identify m5C based on a deep fusion approach with an improved residual network. First, sequence features are extracted from the RNA sequences using Kmer, K-tuple nucleotide frequency component (KNFC), Pseudo dinucleotide composition (PseDNC) and Physical and chemical property (PCP). Kmer and KNFC extract information from a statistical point of view. PseDNC and PCP extract information from the physicochemical properties of RNA sequences. Then, two parts of information are fused with new features using bidirectional long- and short-term memory and attention mechanisms, respectively. Immediately after, the fused features are fed into the improved residual network for classification. Finally, 10-fold cross-validation and independent set testing are used to verify the credibility of the model. The results show that the accuracy reaches 91.87%, 95.55%, 92.27% and 95.60% on the training sets and independent test sets of Arabidopsis thaliana and M.musculus, respectively. This is a considerable improvement compared to previous studies and demonstrates the robust performance of our model.

AVAILABILITY AND IMPLEMENTATION

The data and code related to the study are available at https://github.com/alivelxj/m5c-DFRESG.

摘要

动机

5- 甲基胞嘧啶(m5C)是一种关键的转录后修饰。随着技术的发展,它在各种 RNA 中广泛存在。大量研究表明,m5C 在生物体的各种活动中发挥着重要作用,如 tRNA 识别、RNA 结构稳定、RNA 代谢等。传统的鉴定方法需要通过湿生物学实验来完成,既昂贵又耗时。因此,通常使用计算模型来识别 m5C 位点。由于深度学习具有巨大的计算优势,因此可以通过深度学习算法构建预测模型。

结果

在这项研究中,我们构建了一个基于深度融合方法的 m5C 识别模型,该方法使用改进的残差网络。首先,使用 Kmer、K-tuple 核苷酸频率成分(KNFC)、伪二核苷酸组成(PseDNC)和物理化学性质(PCP)从 RNA 序列中提取序列特征。Kmer 和 KNFC 从统计角度提取信息。PseDNC 和 PCP 从 RNA 序列的理化性质中提取信息。然后,使用双向长短期记忆和注意力机制分别用新特征融合两部分信息。紧接着,将融合后的特征输入改进的残差网络进行分类。最后,使用 10 折交叉验证和独立集测试来验证模型的可信度。结果表明,在拟南芥和小鼠的训练集和独立测试集上,该模型的准确率分别达到 91.87%、95.55%、92.27%和 95.60%。与之前的研究相比,这是一个相当大的改进,证明了我们模型的稳健性能。

可用性和实现

与研究相关的数据和代码可在 https://github.com/alivelxj/m5c-DFRESG 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验