Fiannaca Antonino, La Rosa Massimo, La Paglia Laura, Rizzo Riccardo, Urso Alfonso
ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa, Palermo, 90146 Italy.
BioData Min. 2017 Aug 1;10:27. doi: 10.1186/s13040-017-0148-2. eCollection 2017.
Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks.
We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%.
The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.
非编码RNA(ncRNA)是参与许多生物过程和疾病基因表达调控的小非编码序列。最近发现了大量具有生物学相关作用的不同ncRNA,这为开发能够区分不同ncRNA类别的方法开辟了道路。此外,由于对调控过程的完整机制缺乏了解,以及高通量技术的发展,需要生物信息学工具的帮助,以便生物学家和临床医生更深入地理解ncRNA的功能作用。在这项工作中,我们介绍了一种新的ncRNA分类工具nRC(非编码RNA分类器)。我们的方法基于从ncRNA二级结构中提取特征,并结合一种基于卷积神经网络的深度学习架构的监督分类算法。
我们测试了我们的方法对13种不同ncRNA类别的分类。我们使用最常见的统计量获得了分类分数。特别是,我们达到了约74%的准确率和灵敏度分数。
所提出的方法优于其他基于二级结构特征和机器学习算法的类似分类方法,包括迄今为止作为参考分类器的RNAcon工具。nRC工具可作为docker镜像在https://hub.docker.com/r/tblab/nrc/上免费获取。nRC工具的源代码也可在https://github.com/IcarPA-TBlab/nrc上获取。