Yang Chuyue, Hockenberry Adam J, Jewett Michael C, Amaral Luís A N
Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208.
Interdisciplinary Program in Biological Sciences, Northwestern University, Evanston, Illinois 60208.
G3 (Bethesda). 2016 Nov 8;6(11):3467-3474. doi: 10.1534/g3.116.032227.
Efficient and accurate protein synthesis is crucial for organismal survival in competitive environments. Translation efficiency (the number of proteins translated from a single mRNA in a given time period) is the combined result of differential translation initiation, elongation, and termination rates. Previous research identified the Shine-Dalgarno (SD) sequence as a modulator of translation initiation in bacterial genes, while codon usage biases are frequently implicated as a primary determinant of elongation rate variation. Recent studies have suggested that SD sequences within coding sequences may negatively affect translation elongation speed, but this claim remains controversial. Here, we present a metric to quantify the prevalence of SD sequences in coding regions. We analyze hundreds of bacterial genomes and find that the coding sequences of highly expressed genes systematically contain fewer SD sequences than expected, yielding a robust correlation between the normalized occurrence of SD sites and protein abundances across a range of bacterial taxa. We further show that depletion of SD sequences within ribosomal protein genes is correlated with organismal growth rates, supporting the hypothesis of strong selection against the presence of these sequences in coding regions and suggesting their association with translation efficiency in bacteria.
在竞争环境中,高效准确的蛋白质合成对于生物体的生存至关重要。翻译效率(在给定时间段内从单个mRNA翻译出的蛋白质数量)是翻译起始、延伸和终止速率差异的综合结果。先前的研究将Shine-Dalgarno(SD)序列确定为细菌基因翻译起始的调节因子,而密码子使用偏好常常被认为是延伸速率变化的主要决定因素。最近的研究表明,编码序列中的SD序列可能会对翻译延伸速度产生负面影响,但这一说法仍存在争议。在此,我们提出一种度量方法来量化编码区域中SD序列的普遍性。我们分析了数百个细菌基因组,发现高表达基因的编码序列系统性地包含比预期更少的SD序列,在一系列细菌分类群中,SD位点的标准化出现频率与蛋白质丰度之间存在强烈的相关性。我们进一步表明,核糖体蛋白基因内SD序列的缺失与生物体生长速率相关,这表明编码区域中存在这些序列会受到强烈选择,并暗示它们与细菌的翻译效率有关。