Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Karolina ut 29, H-1113 Budapest, Hungary.
Proc Natl Acad Sci U S A. 2010 Mar 23;107(12):5429-34. doi: 10.1073/pnas.0907841107. Epub 2010 Mar 8.
Numerous human genes display dual coding within alternatively spliced regions, which give rise to distinct protein products that include segments translated in more than one reading frame. To resolve the ensuing protein structural puzzle, we identified 67 human genes with alternative splice variants comprising a dual-coding region at least 75 nucleotides in length and analyzed the structural status of the protein segments they encode. The inspection of their amino acid composition and predictions by the IUPred and PONDR VSL2 algorithms suggest a high propensity for structural disorder in dual-coding regions. In the case of +1 frameshifts, the average level of disorder in the two frames is similarly high (47.2% in the ancestral frame, 58.2% in the derived frame, with the average level of disorder in human proteins being approximately 30%), whereas in the case of -1 frameshifts, there is a significant tendency to become more disordered upon shifting the frame (16.7% in the ancestral frame, 56.3% in the derived frame). The regions encoded by the derived frame are mostly disordered (disorder percentage > 50%) in 39 out of 62 cases, which strongly suggests that structural disorder enables these protein products to exist and function without the need of a highly evolved 3D fold. The potential advantages are also demonstrated by the appearance of novel functions and the high incidence of transcripts escaping nonsense-mediated decay. By discussing several examples, we demonstrate that dual coding may be an effective mechanism for the evolutionary appearance of novel intrinsically disordered regions with new functions.
许多人类基因在选择性剪接区域表现出双重编码,这导致了不同的蛋白质产物,其中包括以不止一种阅读框翻译的片段。为了解决由此产生的蛋白质结构难题,我们鉴定了 67 个人类基因,这些基因的选择性剪接变体包含至少 75 个核苷酸的双重编码区,并分析了它们编码的蛋白质片段的结构状态。对其氨基酸组成的检查以及 IUPred 和 PONDR VSL2 算法的预测表明,双重编码区具有很高的结构无序倾向。在+1 移码的情况下,两个框架中的无序程度平均都很高(原始框架中的无序程度为 47.2%,衍生框架中的无序程度为 58.2%,而人类蛋白质的平均无序程度约为 30%),而在-1 移码的情况下,移码后无序程度有显著增加的趋势(原始框架中的无序程度为 16.7%,衍生框架中的无序程度为 56.3%)。在 62 个案例中有 39 个案例中,由衍生框架编码的区域大部分是无序的(无序百分比>50%),这强烈表明结构无序使这些蛋白质产物能够存在和发挥功能,而不需要高度进化的 3D 折叠。通过讨论几个例子,我们证明双重编码可能是一种有效的机制,可以在进化过程中出现具有新功能的新型内在无序区域。