Huang Sheng-You, Zou Xiaoqin
Dalton Cardiovascular Research Center and Department of Biochemistry, University of Missouri, Columbia, Missouri 65211, USA.
Proteins. 2007 Feb 1;66(2):399-421. doi: 10.1002/prot.21214.
One approach to incorporate protein flexibility in molecular docking is the use of an ensemble consisting of multiple protein structures. Sequentially docking each ligand into a large number of protein structures is computationally too expensive to allow large-scale database screening. It is challenging to achieve a good balance between docking accuracy and computational efficiency. In this work, we have developed a fast, novel docking algorithm utilizing multiple protein structures, referred to as ensemble docking, to account for protein structural variations. The algorithm can simultaneously dock a ligand into an ensemble of protein structures and automatically select an optimal protein structure that best fits the ligand by optimizing both ligand coordinates and the conformational variable m, where m represents the m-th structure in the protein ensemble. The docking algorithm was validated on 10 protein ensembles containing 105 crystal structures and 87 ligands in terms of binding mode and energy score predictions. A success rate of 93% was obtained with the criterion of root-mean-square deviation <2.5 A if the top five orientations for each ligand were considered, comparable to that of sequential docking in which scores for individual docking are merged into one list by re-ranking, and significantly better than that of single rigid-receptor docking (75% on average). Similar trends were also observed in binding score predictions and enrichment tests of virtual database screening. The ensemble docking algorithm is computationally efficient, with a computational time comparable to that for docking a ligand into a single protein structure. In contrast, the computational time for the sequential docking method increases linearly with the number of protein structures in the ensemble. The algorithm was further evaluated using a more realistic ensemble in which the corresponding bound protein structures of inhibitors were excluded. The results show that ensemble docking successfully predicts the binding modes of the inhibitors, and discriminates the inhibitors from a set of noninhibitors with similar chemical properties. Although multiple experimental structures were used in the present work, our algorithm can be easily applied to multiple protein conformations generated by computational methods, and helps improve the efficiency of other existing multiple protein structure(MPS)-based methods to accommodate protein flexibility.
在分子对接中纳入蛋白质灵活性的一种方法是使用由多个蛋白质结构组成的集合。将每个配体依次对接至大量蛋白质结构在计算上过于昂贵,无法进行大规模数据库筛选。在对接准确性和计算效率之间实现良好平衡具有挑战性。在这项工作中,我们开发了一种快速、新颖的对接算法,该算法利用多个蛋白质结构,称为集合对接,以考虑蛋白质结构变异。该算法可以同时将一个配体对接至一组蛋白质结构,并通过优化配体坐标和构象变量m自动选择最适合该配体的最佳蛋白质结构,其中m代表蛋白质集合中的第m个结构。该对接算法在10个包含105个晶体结构和87个配体的蛋白质集合上,就结合模式和能量得分预测进行了验证。如果考虑每个配体的前五个方向,则均方根偏差<2.5 Å的标准下成功率为93%,与顺序对接相当,在顺序对接中,通过重新排名将各个对接的得分合并到一个列表中,并且明显优于单一刚性受体对接(平均75%)。在虚拟数据库筛选的结合得分预测和富集测试中也观察到了类似趋势。集合对接算法计算效率高,计算时间与将一个配体对接至单个蛋白质结构的时间相当。相比之下,顺序对接方法的计算时间随集合中蛋白质结构的数量线性增加。该算法使用一个更现实的集合进一步评估,其中排除了抑制剂的相应结合蛋白质结构。结果表明,集合对接成功预测了抑制剂的结合模式,并将抑制剂与一组具有相似化学性质的非抑制剂区分开来。虽然本工作中使用了多个实验结构,但我们的算法可以很容易地应用于通过计算方法生成的多个蛋白质构象,并有助于提高其他现有基于多蛋白质结构(MPS)的方法适应蛋白质灵活性的效率。