Pal Abantika, Mulumudy Rohith, Mitra Pralay
Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India.
Proteins. 2022 Mar;90(3):658-669. doi: 10.1002/prot.26263. Epub 2021 Oct 23.
Given a target protein structure, the prime objective of protein design is to find amino acid sequences that will fold/acquire to the given three-dimensional structure. The protein design problem belongs to the non-deterministic polynomial-time-hard class as sequence search space increases exponentially with protein length. To ensure better search space exploration and faster convergence, we propose a protein modularity-based parallel protein design algorithm. The modular architecture of the protein structure is exploited by considering an intermediate structural organization between secondary structure and domain defined as protein unit (PU). Here, we have incorporated a divide-and-conquer approach where a protein is split into PUs and each PU region is explored in a parallel fashion. It has been further analyzed that our shared memory implementation of modularity-based parallel sequence search leads to better search space exploration compared to the case of traditional full protein design. Sequence-based analysis on design sequences depicts an average of 39.7% sequence similarity on the benchmark data set. Structure-based comparison of the modeled structures of the design protein with the target structure exhibited an average root-mean-square deviation of 1.17 Å and an average template modeling score of 0.89. The selected modeled structures of the design protein sequences are validated using 100 ns molecular dynamics simulations where 80% of the proteins have shown better or similar stability to the respective target proteins. Our study informs that our modularity-based protein design algorithm can be extended to protein interaction design as well.
给定一个目标蛋白质结构,蛋白质设计的主要目标是找到能够折叠/获得给定三维结构的氨基酸序列。随着序列搜索空间随蛋白质长度呈指数增长,蛋白质设计问题属于非确定性多项式时间难问题。为了确保更好地探索搜索空间并实现更快的收敛,我们提出了一种基于蛋白质模块化的并行蛋白质设计算法。通过考虑二级结构和定义为蛋白质单元(PU)的结构域之间的中间结构组织,利用了蛋白质结构的模块化架构。在这里,我们采用了分治法,将蛋白质拆分为蛋白质单元,每个蛋白质单元区域以并行方式进行探索。进一步分析表明,与传统的全蛋白质设计相比,我们基于模块化的并行序列搜索的共享内存实现能够更好地探索搜索空间。对设计序列进行的基于序列的分析表明,在基准数据集上平均序列相似度为39.7%。设计蛋白质的建模结构与目标结构的基于结构的比较显示,平均均方根偏差为1.17 Å,平均模板建模分数为0.89。使用100 ns分子动力学模拟对设计蛋白质序列的选定建模结构进行了验证,其中80%的蛋白质对各自的目标蛋白质表现出更好或相似的稳定性。我们的研究表明,我们基于模块化的蛋白质设计算法也可以扩展到蛋白质相互作用设计。