Molecular Plant Physiology, Institute of Environmental Biology, Utrecht University, 3584 CH, Utrecht, The Netherlands.
Theoretical Biology and Bioinformatics, Department of Biology, Utrecht University, 3584 CH, Utrecht, The Netherlands.
RNA. 2019 Mar;25(3):292-304. doi: 10.1261/rna.067983.118. Epub 2018 Dec 19.
Eukaryotic mRNAs contain a 5' leader sequence preceding the main open reading frame (mORF) and, depending on the species, 20%-50% of eukaryotic mRNAs harbor an upstream ORF (uORF) in the 5' leader. An unknown fraction of these uORFs encode sequence conserved peptides (conserved peptide uORFs, CPuORFs). Experimentally validated CPuORFs demonstrated to regulate the translation of downstream mORFs often do so in a metabolite concentration-dependent manner. Previous research has shown that most CPuORFs possess a start codon context suboptimal for translation initiation, which turns out to be favorable for translational regulation. The suboptimal initiation context may even include non-AUG start codons, which makes CPuORFs hard to predict. For this reason, we developed a novel pipeline to identify CPuORFs unbiased of start codon using well-annotated sequence data from 31 eudicot plant species and rice. Our new pipeline was able to identify 29 novel CPuORFs, conserved across a wide variety of eudicot species of which 15 do not initiate with an AUG start codon. In addition to CPuORFs, the pipeline was able to find 14 conserved coding regions directly upstream and in frame with the mORF, which likely initiate translation on a non-AUG start codon. Altogether, our pipeline identified highly conserved coding regions in the 5' leaders of transcripts, including in genes with proven functional importance such as , a key regulator of the circadian clock, and the subunit of the target of rapamycin (TOR) kinase.
真核生物的 mRNA 在主要开放阅读框(mORF)之前含有 5' 先导序列,并且根据物种的不同,20%-50%的真核生物 mRNA 在 5' 先导区含有上游开放阅读框(uORF)。这些 uORF 中有一部分未知的编码序列保守肽(保守肽 uORF,CPuORFs)。经过实验验证的 CPuORFs 被证明可以调节下游 mORF 的翻译,通常是以代谢物浓度依赖的方式进行调节。先前的研究表明,大多数 CPuORFs 具有不利于翻译起始的起始密码子上下文,这对于翻译调控是有利的。这种非最佳起始上下文甚至可能包括非 AUG 起始密码子,这使得 CPuORFs 难以预测。出于这个原因,我们开发了一种新的基于注释序列数据的无偏向性识别 CPuORFs 的方法,该方法使用了来自 31 种真双子叶植物和水稻的 well-annotated 序列数据。我们的新方法能够识别 29 个新的 CPuORFs,它们在广泛的真双子叶植物物种中保守,其中 15 个不以 AUG 起始密码子开始。除了 CPuORFs 之外,该方法还能够找到 14 个直接位于 mORF 上游并与之框内的保守编码区,这些编码区可能以非 AUG 起始密码子开始翻译。总的来说,我们的方法能够识别出转录物 5' 先导区中高度保守的编码区,包括那些具有已证明功能重要性的基因,如 ,生物钟的关键调节剂,以及雷帕霉素(TOR)激酶的 亚基。