Li Rui-Xiang, Zhang Ning-Ning, Wu Bin, OuYang Bo, Shen Hong-Bin
Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
State Key Laboratory of Molecular Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 201203, China.
Comput Struct Biotechnol J. 2021 Apr 25;19:2575-2587. doi: 10.1016/j.csbj.2021.04.046. eCollection 2021.
Protein design usually involves sequence search process and evaluation criteria. Commonly used methods primarily implement the Monte Carlo or simulated annealing algorithm with a single-energy function to obtain ideal solutions, which is often highly time-consuming and limited by the accuracy of the energy function. In this report, we introduce a multiobjective algorithm named Hydra for protein design, which employs two different energy functions to optimize solutions simultaneously and makes use of the latent quantitative relationship between different amino acid types to facilitate the search process. The framework uses two kinds of prior information to transform the original disordered discrete sequence space into a relatively ordered space, and decoy sequences are searched in this ordered space through a multiobjective swarm intelligence algorithm. This algorithm features high accuracy and a high-speed search process. Our method was tested on 40 targets covering different fold classes, which were computationally verified to be well folded, and it experimentally solved the 1UBQ fold by NMR in excellent agreement with the native structure with a backbone RMSD deviation of 1.074 Å. The Hydra software package can be downloaded from: http://www.csbio.sjtu.edu.cn/bioinf/HYDRA/ for academic use.
蛋白质设计通常涉及序列搜索过程和评估标准。常用方法主要通过单能量函数实现蒙特卡罗或模拟退火算法以获得理想解,这通常非常耗时且受能量函数准确性的限制。在本报告中,我们介绍了一种名为Hydra的用于蛋白质设计的多目标算法,该算法采用两种不同的能量函数同时优化解,并利用不同氨基酸类型之间的潜在定量关系来促进搜索过程。该框架使用两种先验信息将原始无序的离散序列空间转换为相对有序的空间,并通过多目标群体智能算法在这个有序空间中搜索诱饵序列。该算法具有高精度和高速搜索过程的特点。我们的方法在涵盖不同折叠类别的40个靶标上进行了测试,这些靶标经计算验证能正确折叠,并且通过核磁共振实验解析了1UBQ折叠,与天然结构的一致性极佳,主链均方根偏差为1.074 Å。Hydra软件包可从以下网址下载以供学术使用:http://www.csbio.sjtu.edu.cn/bioinf/HYDRA/ 。