Graduate School of Medical Sciences, Kanazawa University, Kanazawa, 920-1192, Japan.
Graduate School of Horticulture, Chiba University, Matsudo, 271-8510, Japan.
BMC Genomics. 2020 Mar 30;21(1):260. doi: 10.1186/s12864-020-6662-5.
Upstream open reading frames (uORFs) in the 5'-untranslated regions (5'-UTRs) of certain eukaryotic mRNAs encode evolutionarily conserved functional peptides, such as cis-acting regulatory peptides that control translation of downstream main ORFs (mORFs). For genome-wide searches for uORFs with conserved peptide sequences (CPuORFs), comparative genomic studies have been conducted, in which uORF sequences were compared between selected species. To increase chances of identifying CPuORFs, we previously developed an approach in which uORF sequences were compared using BLAST between Arabidopsis and any other plant species with available transcript sequence databases. If this approach is applied to multiple plant species belonging to phylogenetically distant clades, it is expected to further comprehensively identify CPuORFs conserved in various plant lineages, including those conserved among relatively small taxonomic groups.
To efficiently compare uORF sequences among many species and efficiently identify CPuORFs conserved in various taxonomic lineages, we developed a novel pipeline, ESUCA. We applied ESUCA to the genomes of five angiosperm species, which belong to phylogenetically distant clades, and selected CPuORFs conserved among at least three different orders. Through these analyses, we identified 89 novel CPuORF families. As expected, ESUCA analysis of each of the five angiosperm genomes identified many CPuORFs that were not identified from ESUCA analyses of the other four species. However, unexpectedly, these CPuORFs include those conserved across wide taxonomic ranges, indicating that the approach used here is useful not only for comprehensive identification of narrowly conserved CPuORFs but also for that of widely conserved CPuORFs. Examination of the effects of 11 selected CPuORFs on mORF translation revealed that CPuORFs conserved only in relatively narrow taxonomic ranges can have sequence-dependent regulatory effects, suggesting that most of the identified CPuORFs are conserved because of functional constraints of their encoded peptides.
This study demonstrates that ESUCA is capable of efficiently identifying CPuORFs likely to be conserved because of the functional importance of their encoded peptides. Furthermore, our data show that the approach in which uORF sequences from multiple species are compared with those of many other species, using ESUCA, is highly effective in comprehensively identifying CPuORFs conserved in various taxonomic ranges.
真核生物 mRNA 5'非翻译区(5'-UTRs)中的上游开放阅读框(uORFs)编码进化保守的功能肽,例如顺式作用调节肽,可控制下游主要开放阅读框(mORFs)的翻译。为了在全基因组范围内搜索具有保守肽序列的 uORFs(CPuORFs),进行了比较基因组研究,其中比较了选定物种之间的 uORF 序列。为了增加识别 CPuORFs 的机会,我们之前开发了一种方法,即在拟南芥和任何其他具有可用转录序列数据库的植物物种之间使用 BLAST 比较 uORF 序列。如果将这种方法应用于属于系统发育较远分支的多个植物物种,预计将进一步全面识别各种植物谱系中保守的 CPuORFs,包括在相对较小的分类群中保守的 CPuORFs。
为了在许多物种之间高效比较 uORF 序列并有效地识别各种分类谱系中保守的 CPuORFs,我们开发了一种新的管道,ESUCA。我们将 ESUCA 应用于属于系统发育较远分支的五个被子植物物种的基因组中,并选择了至少在三个不同目之间保守的 CPuORFs。通过这些分析,我们鉴定了 89 个新的 CPuORF 家族。正如预期的那样,对五个被子植物基因组中的每一个进行 ESUCA 分析都鉴定出了许多从其他四个物种的 ESUCA 分析中未鉴定出的 CPuORFs。然而,出乎意料的是,这些 CPuORFs 包括那些在广泛的分类范围内保守的 CPuORFs,这表明这里使用的方法不仅可用于全面识别狭义保守的 CPuORFs,还可用于广泛保守的 CPuORFs。对 11 个选定的 CPuORFs 对 mORF 翻译的影响的检查表明,仅在相对狭窄的分类范围内保守的 CPuORFs 具有序列依赖性调节作用,这表明大多数鉴定出的 CPuORFs 是保守的,因为其编码肽的功能约束。
这项研究表明,ESUCA 能够有效地识别可能由于其编码肽的功能重要性而保守的 CPuORFs。此外,我们的数据表明,使用 ESUCA 将来自多个物种的 uORF 序列与许多其他物种的序列进行比较的方法,在全面识别各种分类范围内保守的 CPuORFs 方面非常有效。