Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Departamento de Biología Molecular, Instituto Universitario de Biología Molecular (IUBM), Universidad Autónoma de Madrid, 28049 Madrid, Spain.
Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Genomic and NGS Facility (GENGS), 28049 Madrid, Spain.
Genes (Basel). 2023 Aug 17;14(8):1637. doi: 10.3390/genes14081637.
Advances in next-generation sequencing methodologies have facilitated the assembly of an ever-increasing number of genomes. Gene annotations are typically conducted via specialized software, but the most accurate results require additional manual curation that incorporates insights derived from functional and bioinformatic analyses (e.g., transcriptomics, proteomics, and phylogenetics). In this study, we improved the annotation of the (strain HU3) genome using publicly available data from the deep sequencing of ribosome-protected mRNA fragments (Ribo-Seq). As a result of this analysis, we uncovered 70 previously non-annotated protein-coding genes and improved the annotation of around 600 genes. Additionally, we present evidence for small upstream open reading frames (uORFs) in a significant number of transcripts, indicating their potential role in the translational regulation of gene expression. The bioinformatics pipelines developed for these analyses can be used to improve the genome annotations of other organisms for which Ribo-Seq data are available. The improvements provided by these studies will bring us closer to the ultimate goal of a complete and accurately annotated genome and will enhance future transcriptomics, proteomics, and genetics studies.
下一代测序方法的进步促进了越来越多基因组的组装。基因注释通常通过专门的软件进行,但最准确的结果需要额外的手动整理,其中包括从功能和生物信息学分析(例如转录组学、蛋白质组学和系统发生学)中得出的见解。在这项研究中,我们使用核糖体保护 mRNA 片段(Ribo-Seq)深度测序的公开数据改进了 (菌株 HU3) 基因组的注释。通过这项分析,我们发现了 70 个以前未注释的蛋白质编码基因,并改进了大约 600 个基因的注释。此外,我们还提供了大量转录物中小的上游开放阅读框 (uORFs) 的证据,表明它们在基因表达的翻译调控中可能发挥作用。为这些分析开发的生物信息学管道可用于改进具有 Ribo-Seq 数据的其他生物体的基因组注释。这些研究提供的改进将使我们更接近最终目标,即获得一个完整且准确注释的 基因组,并将增强未来的转录组学、蛋白质组学和遗传学研究。