IBISC, Université Evry, Université Paris-Saclay, Evry, France.
Institute of Plant Sciences Paris-Saclay, INRA, CNRS, Université Paris-Sud, Université d'Evry, Université Paris-Diderot, Orsay, France.
Bioinformatics. 2018 Sep 1;34(17):i620-i628. doi: 10.1093/bioinformatics/bty572.
Non-coding RNAs (ncRNAs) play important roles in many biological processes and are involved in many diseases. Their identification is an important task, and many tools exist in the literature for this purpose. However, almost all of them are focused on the discrimination of coding and ncRNAs without giving more biological insight. In this paper, we propose a new reliable method called IRSOM, based on a supervised Self-Organizing Map (SOM) with a rejection option, that overcomes these limitations. The rejection option in IRSOM improves the accuracy of the method and also allows identifing the ambiguous transcripts. Furthermore, with the visualization of the SOM, we analyze the rejected predictions and highlight the ambiguity of the transcripts.
IRSOM was tested on datasets of several species from different reigns, and shown better results compared to state-of-art. The accuracy of IRSOM is always greater than 0.95 for all the species with an average specificity of 0.98 and an average sensitivity of 0.99. Besides, IRSOM is fast (it takes around 254 s to analyze a dataset of 147 000 transcripts) and is able to handle very large datasets.
IRSOM is implemented in Python and C++. It is available on our software platform EvryRNA (http://EvryRNA.ibisc.univ-evry.fr).
非编码 RNA(ncRNAs)在许多生物过程中发挥着重要作用,并与许多疾病有关。鉴定它们是非编码 RNA 是一项重要的任务,文献中有许多工具可用于此目的。然而,几乎所有这些工具都专注于区分编码和非编码 RNA,而没有提供更多的生物学见解。在本文中,我们提出了一种新的可靠方法 IRSOM,它基于具有拒绝选项的监督自组织映射(SOM),克服了这些限制。IRSOM 的拒绝选项提高了方法的准确性,同时还允许识别模棱两可的转录本。此外,通过 SOM 的可视化,我们分析了被拒绝的预测,并突出了转录本的歧义性。
IRSOM 在来自不同领域的几个物种的数据集上进行了测试,并与最先进的方法相比显示出更好的结果。对于所有物种,IRSOM 的准确性始终大于 0.95,平均特异性为 0.98,平均灵敏度为 0.99。此外,IRSOM 速度很快(分析 147000 个转录本的数据集大约需要 254 秒),并且能够处理非常大的数据集。
IRSOM 是用 Python 和 C++实现的。它可在我们的 EvryRNA 软件平台上使用(http://EvryRNA.ibisc.univ-evry.fr)。