Singh Amit, Schermann Géza, Reislöhner Sven, Kellner Nikola, Hurt Ed, Brunner Michael
Heidelberg University Biochemistry Center (BZH), Im Neuenheimer Feld 328, D-69120 Heidelberg, Germany.
Genes (Basel). 2021 Sep 29;12(10):1549. doi: 10.3390/genes12101549.
A correct genome annotation is fundamental for research in the field of molecular and structural biology. The annotation of the reference genome of has been reported previously, but it is essentially limited to open reading frames (ORFs) of protein coding genes and contains only a few noncoding transcripts. In this study, we identified and annotated full-length transcripts of by deep RNA sequencing. We annotated 7044 coding genes and 4567 noncoding genes. Astonishingly, 23% of the coding genes are alternatively spliced. We identified 679 novel coding genes as well as 2878 novel noncoding genes and corrected the structural organization of more than 50% of the previously annotated genes. Furthermore, we substantially extended the Gene Ontology (GO) and Enzyme Commission (EC) lists, which provide comprehensive search tools for potential industrial applications and basic research. The identified novel transcripts and improved annotation will help to understand the gene regulatory landscape in The analysis pipeline developed here can be used to build transcriptome assemblies and identify coding and noncoding RNAs of other species.
正确的基因组注释是分子与结构生物学领域研究的基础。此前已有关于[物种名称]参考基因组注释的报道,但基本上仅限于蛋白质编码基因的开放阅读框(ORF),且仅包含少数非编码转录本。在本研究中,我们通过深度RNA测序鉴定并注释了[物种名称]的全长转录本。我们注释了7044个编码基因和4567个非编码基因。令人惊讶的是,23%的编码基因存在可变剪接。我们鉴定出679个新的编码基因以及2878个新的非编码基因,并校正了超过50%先前注释基因的结构组织。此外,我们大幅扩展了基因本体论(GO)和酶委员会(EC)列表,这些列表为潜在的工业应用和基础研究提供了全面的搜索工具。鉴定出的新转录本和改进的注释将有助于理解[物种名称]中的基因调控格局。此处开发的分析流程可用于构建转录组组装体,并鉴定其他物种的编码和非编码RNA。