Suppr超能文献

利用细菌 Rend-seq 数据集进行全基因组转录边界注释。

Genome-wide annotation of transcript boundaries using bacterial Rend-seq datasets.

机构信息

Life Sciences Department, University of Bath, Claverton Down, Bath, BA2 7AY, UK.

Milner Centre for Evolution, Life Sciences Department, University of Bath, Claverton Down, Bath, BA2 7AY, UK.

出版信息

Microb Genom. 2024 Apr;10(4). doi: 10.1099/mgen.0.001239.

Abstract

Accurate annotation to single-nucleotide resolution of the transcribed regions in genomes is key to optimally analyse RNA-seq data, understand regulatory events and for the design of experiments. However, currently most genome annotations provided by GenBank generally lack information about untranslated regions. Additionally, information regarding genomic locations of non-coding RNAs, such as sRNAs, or anti-sense RNAs is frequently missing. To provide such information, diverse RNA-seq technologies, such as Rend-seq, have been developed and applied to many bacterial species. However, incorporating this vast amount of information into annotation files has been limited and is bioinformatically challenging, resulting in UTRs and other non-coding elements being overlooked or misrepresented. To overcome this problem, we present pyRAP (python Rend-seq Annotation Pipeline), a software package that analyses Rend-seq datasets to accurately resolve transcript boundaries genome-wide. We report the use of pyRAP to find novel transcripts, transcript isoforms, and RNase-dependent sRNA processing events. In we uncovered 63 novel transcripts and provide genomic coordinates with single-nucleotide resolution for 2218 5'UTRs, 1864 3'UTRs and 161 non-coding RNAs. In we report 117 novel transcripts, 2429 5'UTRs, 1619 3'UTRs and 91 non-coding RNAs, and in 16 novel transcripts, 664 5'UTRs, 696 3'UTRs, and 81 non-coding RNAs. Finally, we use pyRAP to produce updated annotation files for , , and for use in the wider microbial genomics research community.

摘要

准确注释基因组中转录区域的单核苷酸分辨率是优化分析 RNA-seq 数据、理解调控事件以及设计实验的关键。然而,目前 GenBank 提供的大多数基因组注释通常缺乏关于非翻译区的信息。此外,关于非编码 RNA(如 sRNA 或反义 RNA)的基因组位置的信息经常缺失。为了提供这些信息,已经开发并应用了各种 RNA-seq 技术,如 Rend-seq,应用于许多细菌物种。然而,将如此大量的信息纳入注释文件受到限制,并且在生物信息学上具有挑战性,导致 UTR 和其他非编码元件被忽视或表示不当。为了克服这个问题,我们提出了 pyRAP(python Rend-seq Annotation Pipeline),这是一个软件包,可分析 Rend-seq 数据集,以准确解析全基因组范围内的转录边界。我们报告了使用 pyRAP 发现新的转录物、转录本异构体和依赖于 RNase 的 sRNA 加工事件。在[1]中,我们发现了 63 个新的转录物,并提供了具有单核苷酸分辨率的基因组坐标,用于 2218 个 5'UTR、1864 个 3'UTR 和 161 个非编码 RNA。在[2]中,我们报告了 117 个新的转录物、2429 个 5'UTR、1619 个 3'UTR 和 91 个非编码 RNA,在[3]中,有 16 个新的转录物、664 个 5'UTR、696 个 3'UTR 和 81 个非编码 RNA。最后,我们使用 pyRAP 为[4]、[5]和[6]生成更新的注释文件,供更广泛的微生物基因组学研究社区使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48cb/11092163/d4f4bc5d9589/mgen-10-01239-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验