Taneda Akito, Asai Kiyoshi
Graduate School of Science and Technology, Hirosaki University, Hirosaki, Aomori 036-8561, Japan.
Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba 277-8562, Japan.
Comput Struct Biotechnol J. 2020 Jun 30;18:1811-1818. doi: 10.1016/j.csbj.2020.06.035. eCollection 2020.
Codon optimization in protein-coding sequences (CDSs) is a widely used technique to promote the heterologous expression of target genes. In codon optimization, a combinatorial space of nucleotide sequences that code a given amino acid sequence and take into account user-prescribed forbidden sequence motifs is explored to optimize multiple criteria. Although evolutionary algorithms have been used to tackle such complex codon optimization problems, evolutionary codon optimization tools do not provide guarantees to find the optimal solutions for these multicriteria codon optimization problems. We have developed a novel multicriteria dynamic programming algorithm, COSMO. By using this algorithm, we can obtain all Pareto-optimal solutions for the multiple features of CDS, which include codon usage, codon context, and the number of hidden stop codons. User-prescribed forbidden sequence motifs are rigorously excluded from the Pareto-optimal solutions. To accelerate CDS design by COSMO, we introduced constraints that reduce the number of Pareto-optimal solutions to be processed in a branch-and-bound manner. We benchmarked COSMO for run-time and the number of generated solutions by adapting selected human genes to yeast codon usage frequencies, and found that the constraints effectively reduce the run-time. In addition to the benchmarking of COSMO, a multi-objective genetic algorithm (MOGA) for CDS design was also benchmarked for the same two aspects and their performances were compared. In this comparison, (i) MOGA identified significantly fewer Pareto-optimal solutions than COSMO, and (ii) the MOGA solutions did not achieve the same mean hypervolume values as those provided by COSMO. These results suggest that generating the whole set of the Pareto-optimal solutions of the codon optimization problems is a difficult task for MOGA.
蛋白质编码序列(CDS)中的密码子优化是一种广泛应用的技术,用于促进目标基因的异源表达。在密码子优化中,会探索编码给定氨基酸序列并考虑用户规定的禁止序列基序的核苷酸序列组合空间,以优化多个标准。尽管进化算法已被用于解决此类复杂的密码子优化问题,但进化密码子优化工具并不能保证找到这些多标准密码子优化问题的最优解。我们开发了一种新颖的多标准动态规划算法COSMO。通过使用该算法,我们可以获得CDS多种特征的所有帕累托最优解,这些特征包括密码子使用情况、密码子上下文以及隐藏终止密码子的数量。帕累托最优解中严格排除了用户规定的禁止序列基序。为了通过COSMO加速CDS设计,我们引入了约束条件,以分支定界的方式减少需要处理的帕累托最优解的数量。我们通过使选定的人类基因适应酵母密码子使用频率,对COSMO的运行时间和生成的解的数量进行了基准测试,发现这些约束条件有效地减少了运行时间。除了对COSMO进行基准测试外,还对用于CDS设计的多目标遗传算法(MOGA)的相同两个方面进行了基准测试,并比较了它们的性能。在这次比较中,(i)MOGA识别出的帕累托最优解明显少于COSMO,并且(ii)MOGA的解没有达到与COSMO提供的解相同的平均超体积值。这些结果表明,对于MOGA来说,生成密码子优化问题的整个帕累托最优解集是一项艰巨的任务。