BioSciences Institute, University College Cork, Cork, Ireland.
Nucleic Acids Res. 2011 May;39(10):4220-34. doi: 10.1093/nar/gkr007. Epub 2011 Jan 25.
In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5' cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized--both for increased coding capacity and potentially also for novel regulatory mechanisms--remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5' untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.
在真核生物中,通常假定翻译起始于最接近信使 RNA 5' 帽的 AUG 密码子。然而,在某些情况下,起始可以发生在与 AUG 仅相差一个核苷酸的密码子上,特别是 CUG、UUG、GUG、ACG、AUA 和 AUU 密码子。虽然已经在少数人类基因中通过实验验证了非 AUG 起始,但这种现象的利用程度——无论是为了增加编码能力还是可能用于新的调节机制——仍然不清楚。为了解决这个问题,从而提高现有编码序列注释的质量,我们开发了一种基于预测的同源基因 5' 非翻译区的系统发育分析的方法。我们使用蛋白质编码序列的进化特征作为注释编码序列上游翻译起始的指标。我们的搜索在 42 个人类基因中发现了新的保守的潜在非 AUG 起始的 N 端延伸,包括 VANGL2、FGFR1、KCNN4、TRPV6、HDGF、CITED2、EIF4G3 和 NTF3,并且还证实了 17 个其他基因中已知的非 AUG 起始延伸的保守性。在几种情况下,我们能够从以前的文献和核糖体分析数据中获得非 AUG 起始产物表达的独立实验证据。