Center for the Development of Therapeutics, The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA, 02142, USA.
BMC Biol. 2021 Feb 19;19(1):36. doi: 10.1186/s12915-021-00968-8.
Custom genes have become a common resource in recombinant biology over the last 20 years due to the plummeting cost of DNA synthesis. These genes are often "optimized" to non-native sequences for overexpression in a non-native host by substituting synonymous codons within the coding DNA sequence (CDS). A handful of studies have compared native and optimized CDSs, reporting different levels of soluble product due to the accumulation of misfolded aggregates, variable activity of enzymes, and (at least one report of) a change in substrate specificity. No study, to the best of our knowledge, has performed a practical comparison of CDSs generated from different codon optimization algorithms or reported the corresponding protein yields.
In our efforts to understand what factors constitute an optimized CDS, we identified that there is little consensus among codon-optimization algorithms, a roughly equivalent chance that an algorithm-optimized CDS will increase or diminish recombinant yields as compared to the native DNA, a near ubiquitous use of a codon database that was last updated in 2007, and a high variability of output CDSs by some algorithms. We present a case study, using KRas4B, to demonstrate that a median codon frequency may be a better predictor of soluble yields than the more commonly utilized CAI metric.
We present a method for visualizing, analyzing, and comparing algorithm-optimized DNA sequences for recombinant protein expression. We encourage researchers to consider if DNA optimization is right for their experiments, and work towards improving the reproducibility of published recombinant work by publishing non-native CDSs.
在过去的 20 年中,由于 DNA 合成成本的大幅下降,定制基因已成为重组生物学中的常用资源。这些基因通常通过在编码 DNA 序列(CDS)中替换同义密码子来“优化”为非天然序列,以在非天然宿主中进行过表达。少数研究比较了天然和优化的 CDS,由于错误折叠的聚集体积累、酶活性的变化以及(至少有一份报告)底物特异性的变化,报告了不同水平的可溶性产物。据我们所知,没有研究对来自不同密码子优化算法的 CDS 进行实际比较,也没有报告相应的蛋白质产量。
在努力理解构成优化 CDS 的因素时,我们发现密码子优化算法之间几乎没有共识,与天然 DNA 相比,算法优化的 CDS 增加或减少重组产量的机会大致相等,近乎普遍使用的是 2007 年更新的密码子数据库,并且某些算法的输出 CDS 变化很大。我们通过 KRas4B 进行了案例研究,证明中位密码子频率可能比更常用的 CAI 指标更好地预测可溶性产量。
我们提出了一种用于可视化、分析和比较用于重组蛋白表达的算法优化 DNA 序列的方法。我们鼓励研究人员考虑 DNA 优化是否适合他们的实验,并通过发表非天然 CDS 来努力提高已发表重组工作的可重复性。