Unidad de Genómica Avanzada, Langebio, Cinvestav, 36824, Irapuato, Guanajuato, Mexico.
Bioinformatics Group, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.
Sci Rep. 2022 Aug 18;12(1):14063. doi: 10.1038/s41598-022-18254-0.
Long non-coding RNAs (lncRNAs) are a prominent class of eukaryotic regulatory genes. Despite the numerous available transcriptomic datasets, the annotation of plant lncRNAs remains based on dated annotations that have been historically carried over. We present a substantially improved annotation of Arabidopsis thaliana lncRNAs, generated by integrating 224 transcriptomes in multiple tissues, conditions, and developmental stages. We annotate 6764 lncRNA genes, including 3772 that are novel. We characterize their tissue expression patterns and find 1425 lncRNAs are co-expressed with coding genes, with enriched functional categories such as chloroplast organization, photosynthesis, RNA regulation, transcription, and root development. This improved transcription-guided annotation constitutes a valuable resource for studying lncRNAs and the biological processes they may regulate.
长非编码 RNA(lncRNAs)是一类重要的真核生物调控基因。尽管有大量可用的转录组数据集,但植物 lncRNAs 的注释仍然基于历史上沿用的陈旧注释。我们通过整合 224 个在多种组织、条件和发育阶段的转录组,提供了一个大大改进的拟南芥 lncRNAs 注释。我们注释了 6764 个 lncRNA 基因,其中 3772 个是新的。我们描述了它们的组织表达模式,发现 1425 个 lncRNAs 与编码基因共表达,富集的功能类别包括叶绿体组织、光合作用、RNA 调控、转录和根发育。这个改进的转录指导注释为研究 lncRNAs 及其可能调控的生物学过程提供了有价值的资源。