Suppr超能文献

FEELnc:一种用于长链非编码RNA注释的工具及其在犬转录组中的应用。

FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome.

作者信息

Wucher Valentin, Legeai Fabrice, Hédan Benoît, Rizk Guillaume, Lagoutte Lætitia, Leeb Tosso, Jagannathan Vidhya, Cadieu Edouard, David Audrey, Lohi Hannes, Cirera Susanna, Fredholm Merete, Botherel Nadine, Leegwater Peter A J, Le Béguec Céline, Fieten Hille, Johnson Jeremy, Alföldi Jessica, André Catherine, Lindblad-Toh Kerstin, Hitte Christophe, Derrien Thomas

机构信息

Institut Génétique et Développement de Rennes, CNRS, UMR6290, University Rennes1, Rennes, Cedex 35043, France.

IGEPP, BIPAA, INRA, Campus Beaulieu, Le Rheu 35653, France.

出版信息

Nucleic Acids Res. 2017 May 5;45(8):e57. doi: 10.1093/nar/gkw1306.

Abstract

Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/FEELnc.

摘要

全转录组测序(RNA测序)已成为对RNA群体进行编目和监测的标准方法。然而,主要瓶颈之一是在大量重建转录本中正确识别不同类别的RNA,特别是那些将从长链非编码RNA(lncRNA)类别中翻译出来的(mRNA)。在这里,我们展示了FEELnc(长链非编码RNA的灵活提取),这是一个无需比对的程序,它基于一个随机森林模型准确注释lncRNA,该模型是用诸如多k-mer频率和宽松开放阅读框等一般特征训练的。与五个最先进的工具进行基准测试表明,FEELnc在GENCODE和NONCODE数据集上实现了相似或更好的分类性能。该程序还提供了特定模块,使用户能够微调分类准确性,使lncRNA类别的注释形式化,甚至在没有非编码RNA训练集的情况下识别lncRNA。我们在由欧洲LUPA联盟产生的包含20个犬类RNA测序样本的真实数据集上使用FEELnc,大幅扩展了犬类基因组注释,纳入了10374个新的lncRNA和58640个mRNA转录本。FEELnc通过提供一种标准化且完整的lncRNA注释解决方案,超越了传统的编码潜能分类器,可在https://github.com/tderrien/FEELnc上免费获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验