College of Software, Jilin University, Changchun, 130012, China.
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China.
BMC Bioinformatics. 2023 Feb 27;24(1):68. doi: 10.1186/s12859-023-05191-6.
Although research on non-coding RNAs (ncRNAs) is a hot topic in life sciences, the functions of numerous ncRNAs remain unclear. In recent years, researchers have found that ncRNAs of the same family have similar functions, therefore, it is important to accurately predict ncRNAs families to identify their functions. There are several methods available to solve the prediction problem of ncRNAs family, whose main ideas can be divided into two categories, including prediction based on the secondary structure features of ncRNAs, and prediction according to sequence features of ncRNAs. The first type of prediction method requires a complicated process and has a low accuracy in obtaining the secondary structure of ncRNAs, while the second type of method has a simple prediction process and a high accuracy, but there is still room for improvement. The existing methods for ncRNAs family prediction are associated with problems such as complicated prediction processes and low accuracy, in this regard, it is necessary to propose a new method to predict the ncRNAs family more perfectly.
A deep learning model-based method, ncDENSE, was proposed in this study, which predicted ncRNAs families by extracting ncRNAs sequence features. The bases in ncRNAs sequences were encoded by one-hot coding and later fed into an ensemble deep learning model, which contained the dynamic bi-directional gated recurrent unit (Bi-GRU), the dense convolutional network (DenseNet), and the Attention Mechanism (AM). To be specific, dynamic Bi-GRU was used to extract contextual feature information and capture long-term dependencies of ncRNAs sequences. AM was employed to assign different weights to features extracted by Bi-GRU and focused the attention on information with greater weights. Whereas DenseNet was adopted to extract local feature information of ncRNAs sequences and classify them by the full connection layer. According to our results, the ncDENSE method improved the Accuracy, Sensitivity, Precision, F-score, and MCC by 2.08[Formula: see text], 2.33[Formula: see text], 2.14[Formula: see text], 2.16[Formula: see text], and 2.39[Formula: see text], respectively, compared with the suboptimal method.
Overall, the ncDENSE method proposed in this paper extracts sequence features of ncRNAs by dynamic Bi-GRU and DenseNet and improves the accuracy in predicting ncRNAs family and other data.
尽管非编码 RNA(ncRNA)的研究是生命科学的一个热门话题,但许多 ncRNA 的功能仍不清楚。近年来,研究人员发现,同一家族的 ncRNA 具有相似的功能,因此,准确预测 ncRNA 家族以识别其功能非常重要。有几种方法可用于解决 ncRNA 家族的预测问题,其主要思路可分为两类,包括基于 ncRNA 二级结构特征的预测和基于 ncRNA 序列特征的预测。第一种预测方法需要一个复杂的过程,并且在获取 ncRNA 二级结构方面准确性较低,而第二种方法的预测过程简单,准确性较高,但仍有改进的空间。现有的 ncRNA 家族预测方法存在预测过程复杂、准确性低等问题,因此有必要提出一种新的方法来更完美地预测 ncRNA 家族。
本研究提出了一种基于深度学习模型的方法 ncDENSE,通过提取 ncRNA 序列特征来预测 ncRNA 家族。ncRNA 序列中的碱基采用 one-hot 编码,然后输入到一个集成深度学习模型中,该模型包含动态双向门控循环单元(Bi-GRU)、密集卷积网络(DenseNet)和注意力机制(AM)。具体来说,动态 Bi-GRU 用于提取 ncRNA 序列的上下文特征信息,并捕获 ncRNA 序列的长期依赖关系。AM 用于为 Bi-GRU 提取的特征分配不同的权重,并将注意力集中在权重较大的信息上。而 DenseNet 用于提取 ncRNA 序列的局部特征信息,并通过全连接层对其进行分类。根据我们的结果,与次优方法相比,ncDENSE 方法将 Accuracy、Sensitivity、Precision、F-score 和 MCC 分别提高了 2.08[Formula: see text]、2.33[Formula: see text]、2.14[Formula: see text]、2.16[Formula: see text]和 2.39[Formula: see text]。
总体而言,本文提出的 ncDENSE 方法通过动态 Bi-GRU 和 DenseNet 提取 ncRNA 的序列特征,提高了 ncRNA 家族及其他数据的预测准确性。