Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation.
Laboratory of marker-assisted and genomic selection of plants, All-Russian Research Institute of Agricultural Biotechnology, 127550 Moscow, Russian Federation.
Genome Res. 2019 Sep;29(9):1464-1477. doi: 10.1101/gr.253302.119. Epub 2019 Aug 6.
Genomes contain millions of short (<100 codons) open reading frames (sORFs), which are usually dismissed during gene annotation. Nevertheless, peptides encoded by such sORFs can play important biological roles, and their impact on cellular processes has long been underestimated. Here, we analyzed approximately 70,000 transcribed sORFs in the model plant (moss). Several distinct classes of sORFs that differ in terms of their position on transcripts and the level of evolutionary conservation are present in the moss genome. Over 5000 sORFs were conserved in at least one of 10 plant species examined. Mass spectrometry analysis of proteomic and peptidomic data sets suggested that tens of sORFs located on distinct parts of mRNAs and long noncoding RNAs (lncRNAs) are translated, including conserved sORFs. Translational analysis of the sORFs and main ORFs at a single locus suggested the existence of genes that code for multiple proteins and peptides with tissue-specific expression. Functional analysis of four lncRNA-encoded peptides showed that sORFs-encoded peptides are involved in regulation of growth and differentiation in moss. Knocking out lncRNA-encoded peptides resulted in a decrease of moss growth. In contrast, the overexpression of these peptides resulted in a diverse range of phenotypic effects. Our results thus open new avenues for discovering novel, biologically active peptides in the plant kingdom.
基因组包含数百万个短(<100 个密码子)开放阅读框(sORFs),这些阅读框在基因注释过程中通常会被忽略。然而,这些 sORFs 编码的肽段可以发挥重要的生物学作用,其对细胞过程的影响长期以来被低估了。在这里,我们分析了模式植物(苔藓)中大约 70000 个转录的 sORFs。在苔藓基因组中存在几种不同类别的 sORFs,它们在转录本上的位置和进化保守程度上存在差异。超过 5000 个 sORFs 在至少 10 种被研究的植物物种中至少有一个是保守的。对蛋白质组学和肽组学数据集的质谱分析表明,位于 mRNAs 和长非编码 RNA(lncRNA)不同部位的数十个 sORFs 被翻译,包括保守的 sORFs。在单个基因座上对 sORFs 和主要 ORFs 的翻译分析表明,存在编码多种具有组织特异性表达的蛋白质和肽的基因。对四个 lncRNA 编码肽的功能分析表明,sORFs 编码的肽参与苔藓的生长和分化调控。敲除 lncRNA 编码的肽会导致苔藓生长减少。相比之下,这些肽的过表达会导致多种表型效应。因此,我们的研究结果为在植物界发现新的、具有生物活性的肽开辟了新的途径。