Karaşan Oya, Şen Alper, Tiryaki Banu, Cicek A Ercument
Department of Industrial Engineering, Bilkent University, Ankara 06800, Turkey.
Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey.
Bioinformatics. 2022 Aug 10;38(16):3935-3941. doi: 10.1093/bioinformatics/btac428.
Synthesizing genes to be expressed in other organisms is an essential tool in biotechnology. While the many-to-one mapping from codons to amino acids makes the genetic code degenerate, codon usage in a particular organism is not random either. This bias in codon use may have a remarkable effect on the level of gene expression. A number of measures have been developed to quantify a given codon sequence's strength to express a gene in a host organism. Codon optimization aims to find a codon sequence that will optimize one or more of these measures. Efficient computational approaches are needed since the possible number of codon sequences grows exponentially as the number of amino acids increases.
We develop a unifying modeling approach for codon optimization. With our mathematical formulations based on graph/network representations of amino acid sequences, any combination of measures can be optimized in the same framework by finding a path satisfying additional limitations in an acyclic layered network. We tested our approach on bi-objectives commonly used in the literature, namely, Codon Pair Bias versus Codon Adaptation Index and Relative Codon Pair Bias versus Relative Codon Bias. However, our framework is general enough to handle any number of objectives concurrently with certain restrictions or preferences on the use of specific nucleotide sequences. We implemented our models using Python's Gurobi interface and showed the efficacy of our approach even for the largest proteins available. We also provided experimentation showing that highly expressed genes have objective values close to the optimized values in the bi-objective codon design problem.
http://alpersen.bilkent.edu.tr/NetworkCodon.zip.
Supplementary data are available at Bioinformatics online.
合成要在其他生物体中表达的基因是生物技术中的一项重要工具。虽然从密码子到氨基酸的多对一映射使遗传密码具有简并性,但特定生物体中的密码子使用也不是随机的。这种密码子使用偏好可能对基因表达水平产生显著影响。已经开发了许多措施来量化给定密码子序列在宿主生物体中表达基因的强度。密码子优化旨在找到一个能优化这些措施中的一项或多项的密码子序列。由于密码子序列的可能数量随着氨基酸数量的增加呈指数增长,因此需要高效的计算方法。
我们开发了一种用于密码子优化的统一建模方法。通过基于氨基酸序列的图/网络表示的数学公式,通过在无环分层网络中找到满足附加限制的路径,可以在同一框架内优化任何措施组合。我们在文献中常用的双目标上测试了我们的方法,即密码子对偏好与密码子适应指数以及相对密码子对偏好与相对密码子偏好。然而,我们的框架足够通用,能够在对特定核苷酸序列的使用有某些限制或偏好的情况下同时处理任意数量的目标。我们使用Python的Gurobi接口实现了我们的模型,并展示了我们的方法即使对于可用的最大蛋白质也有效。我们还进行了实验,表明在双目标密码子设计问题中,高表达基因的目标值接近优化值。
http://alpersen.bilkent.edu.tr/NetworkCodon.zip。
补充数据可在《生物信息学》在线获取。