Byeon Gun Woo, Expòsit Marc, Baker David, Seelig Georg
Department of Electrical and Computer Engineering, University of Washington, Seattle, WA, USA.
Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, USA.
bioRxiv. 2025 May 7:2025.05.06.652464. doi: 10.1101/2025.05.06.652464.
In nature, viruses frequently evolve overlapping genes (OLG) in alternate reading frames of the same nucleotide sequence despite the drastically reduced protein sequence space resulting from the sharing of codon nucleotides. Their existence leads one to wonder whether amino acid sequences are sufficiently degenerate with respect to protein folding to broadly allow arbitrary pairs of functional proteins to be overlapped. Here, we investigate this question by engineering synthetic OLGs using state-of-the-art generative models. To evaluate the approach, we first design overlapped sequences targeting two different protein families. We then encode distinct highly ordered de novo protein structures and observe surprisingly high in silico and experimental success rates. This demonstrates that the overlap constraints under the structure of the standard genetic code do not significantly restrict simultaneous accommodation of well defined 3D folds in alternative reading frames. Our work suggests that OLG sequences may be frequently accessible in nature and could be readily exploited to compress and constrain synthetic genetic circuits.
在自然界中,病毒经常在同一核苷酸序列的交替阅读框中进化出重叠基因(OLG),尽管由于密码子核苷酸的共享导致蛋白质序列空间大幅减少。它们的存在让人不禁思考,氨基酸序列在蛋白质折叠方面是否足够简并,从而广泛允许任意一对功能蛋白重叠。在这里,我们通过使用最先进的生成模型构建合成重叠基因来研究这个问题。为了评估该方法,我们首先设计针对两个不同蛋白质家族的重叠序列。然后,我们编码不同的高度有序的从头蛋白质结构,并观察到令人惊讶的高计算机模拟和实验成功率。这表明标准遗传密码结构下的重叠限制不会显著限制在交替阅读框中同时容纳明确的三维折叠结构。我们的工作表明,重叠基因序列在自然界中可能经常出现,并且可以很容易地用于压缩和约束合成遗传电路。