Tang Lixia, Wang Xiong, Ru Beibei, Sun Hengfei, Huang Jian, Gao Hui
School of Life Science and Technology.
School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.
Biotechniques. 2014 Jun 1;56(6):301-2, 304, 306-8, passim. doi: 10.2144/000114177. eCollection 2014 Jun.
Recent computational and bioinformatics advances have enabled the efficient creation of novel biocatalysts by reducing amino acid variability at hot spot regions. To further expand the utility of this strategy, we present here a tool called Multi-site Degenerate Codon Analyzer (MDC-Analyzer) for the automated design of intelligent mutagenesis libraries that can completely cover user-defined randomized sequences, especially when multiple contiguous and/or adjacent sites are targeted. By initially defining an objective function, the possible optimal degenerate PCR primer profiles could be automatically explored using the heuristic approach of Greedy Best-First-Search. Compared to the previously developed DC-Analyzer, MDC-Analyzer allows for the existence of a small amount of undesired sequences as a tradeoff between the number of degenerate primers and the encoded library size while still providing all the benefits of DC-Analyzer with the ability to randomize multiple contiguous sites. MDC-Analyzer was validated using a series of randomly generated mutation schemes and experimental case studies on the evolution of halohydrin dehalogenase, which proved that the MDC methodology is more efficient than other methods and is particularly well-suited to exploring the sequence space of proteins using data-driven protein engineering strategies.
近期,计算和生物信息学的进展使得通过减少热点区域的氨基酸变异性来高效创建新型生物催化剂成为可能。为了进一步拓展该策略的实用性,我们在此展示一种名为多位点简并密码子分析器(MDC-Analyzer)的工具,用于自动设计智能诱变文库,该文库能够完全覆盖用户定义的随机序列,尤其是在针对多个连续和/或相邻位点时。通过最初定义一个目标函数,可以使用贪婪最佳优先搜索的启发式方法自动探索可能的最优简并PCR引物谱。与先前开发的DC-Analyzer相比,MDC-Analyzer允许存在少量不期望的序列,作为简并引物数量与编码文库大小之间的权衡,同时仍具备DC-Analyzer的所有优点,即能够使多个连续位点随机化。通过一系列随机生成的突变方案以及关于卤代醇脱卤酶进化的实验案例研究对MDC-Analyzer进行了验证,结果证明MDC方法比其他方法更高效,特别适合使用数据驱动的蛋白质工程策略来探索蛋白质的序列空间。