Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
Genome Res. 2011 Nov;21(11):1916-28. doi: 10.1101/gr.108753.110. Epub 2011 Oct 12.
The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.
遗传密码的简并性允许蛋白质编码的 DNA 和 RNA 序列同时编码额外的、重叠的功能元件。在纯化选择下进化出蛋白质编码和额外重叠功能的序列,与典型的蛋白质编码基因相比,应该表现出更高的进化保守性——尤其是在同义位点。在这项研究中,我们使用了 29 种胎盘哺乳动物的基因组比对,系统地定位了人类 ORF 内的短区域,这些区域在这些物种中表现出明显较低的同义替代估计率。29 种物种的比对提供了统计能力,能够以分辨率为 9 个密码子窗口定位超过 10000 个这样的区域,这些区域存在于超过四分之一的人类蛋白质编码基因中,包含它们约 2%的同义位点。我们收集了大量证据表明,这些区域观察到的同义约束反映了对重叠功能元件的选择,包括剪接调控元件、双编码基因、RNA 二级结构、microRNA 靶位点和发育增强子。我们的结果表明,尽管基因组景观广阔,但重叠的功能元件在哺乳动物基因中很常见。