Suppr超能文献

ICAnnoLncRNA:一种用于转录组序列中长非编码 RNA 搜索和注释的 SnakeMake 流程。

ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences.

机构信息

Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia.

Kurchatov Center for Genome Research, Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia.

出版信息

Genes (Basel). 2023 Jun 24;14(7):1331. doi: 10.3390/genes14071331.

Abstract

Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.

摘要

长非编码 RNA(lncRNA)是指长度大于 200 个核苷酸的 RNA 分子,不编码蛋白质。实验研究表明,lncRNA 在植物中的功能具有多样性和重要性。为了扩展其他物种中 lncRNA 的知识,最近积极开发了计算管道,这些管道允许在不需要用户控制的模式下进行标准化的数据处理步骤,直到最终结果。这些进展使 lncRNA 数据识别和分析的功能更广泛。在本工作中,我们提出了 ICAnnoLncRNA 管道,用于自动识别、分类和注释组装转录组序列中的植物 lncRNA。它使用 LncFinder 软件来识别 lncRNA,并允许使用具有 lncRNA 注释的基因组数据调整识别参数。该管道允许预测 lncRNA 候选物,将 lncRNA 序列与参考基因组对齐,过滤错误/噪声转录本和可能的转座元件,根据基因组位置对 lncRNA 进行分类,与来自外部数据库的序列进行比较,并分析 lncRNA 的结构特征和表达。我们使用 Trinity 和 Hisat2/StringTie 组装的 15 个玉米文库的转录组序列来演示 ICAnnoLncRNA 管道的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f8b/10379598/fa74fa45a7b0/genes-14-01331-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验