Integrated Laboratory of Morphofunctional Sciences, Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.
Health Systems Engineering Laboratory, Alberto Luiz Coimbra Institute of Graduate Studies and Engineering Research (COPPE), Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.
DNA Res. 2021 Sep 13;28(5). doi: 10.1093/dnares/dsab007.
Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in non-canonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into non-expressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in non-coding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.
小开放阅读框(small open reading frames/sORFs/smORFs)是潜在的编码序列,长度小于 100 个密码子,这些序列在基因预测软件和注释筛选中一直被认为是“垃圾 DNA”;然而,下一代测序技术的出现促进了对“垃圾 DNA”区域及其转录产物的深入研究,导致 smORFs 成为系统生物学中新的研究焦点。最近在非规范 mRNA 中报道了几个 smORF 肽,作为许多生物学背景下的新参与者;然而,在编码潜力分析中,它们的相关性仍然被忽视。因此,本综述提出了一种基于转录特征的 smORF 分类方法,讨论了根据其不同特征研究 smORFs 的最有前途的方法。首先,将 smORFs 分为非表达(基因间)和表达(基因内)smORFs。其次,将基因内 smORFs 分为位于非编码 RNA(ncRNA)或规范 mRNA 中的 smORFs。最后,ncRNA 中的 smORFs 进一步细分为位于小或长 RNA 中的序列,而位于规范 mRNA 中的 smORFs 根据其在基因上的定位进一步细分为几个特定类别。我们希望本综述能为大规模注释提供新的见解,并加强 smORFs 作为隐藏编码 DNA 世界的重要组成部分的作用。