Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, United States.
Comput Biol Med. 2012 Jul;42(7):758-71. doi: 10.1016/j.compbiomed.2012.05.001. Epub 2012 Jun 2.
We present a scalable and accurate method for classifying protein-ligand binding geometries in molecular docking. Our method is a three-step process: the first step encodes the geometry of a three-dimensional (3D) ligand conformation into a single 3D point in the space; the second step builds an octree by assigning an octant identifier to every single point in the space under consideration; and the third step performs an octree-based clustering on the reduced conformation space and identifies the most dense octant. We adapt our method for MapReduce and implement it in Hadoop. The load-balancing, fault-tolerance, and scalability in MapReduce allow screening of very large conformation spaces not approachable with traditional clustering methods. We analyze results for docking trials for 23 protein-ligand complexes for HIV protease, 21 protein-ligand complexes for Trypsin, and 12 protein-ligand complexes for P38alpha kinase. We also analyze cross docking trials for 24 ligands, each docking into 24 protein conformations of the HIV protease, and receptor ensemble docking trials for 24 ligands, each docking in a pool of HIV protease receptors. Our method demonstrates significant improvement over energy-only scoring for the accurate identification of native ligand geometries in all these docking assessments. The advantages of our clustering approach make it attractive for complex applications in real-world drug design efforts. We demonstrate that our method is particularly useful for clustering docking results using a minimal ensemble of representative protein conformational states (receptor ensemble docking), which is now a common strategy to address protein flexibility in molecular docking.
我们提出了一种可扩展且准确的方法,用于对分子对接中的蛋白质-配体结合构象进行分类。我们的方法是一个三步过程:第一步将三维(3D)配体构象的几何形状编码为空间中的单个 3D 点;第二步通过为空间中的每个点分配一个八叉树标识符来构建八叉树;第三步对简化构象空间执行基于八叉树的聚类,并识别最密集的八叉树。我们将我们的方法适应 MapReduce 并在 Hadoop 中实现它。MapReduce 的负载平衡、容错性和可扩展性允许筛选传统聚类方法无法处理的非常大的构象空间。我们分析了针对 HIV 蛋白酶的 23 个蛋白质-配体复合物、针对胰蛋白酶的 21 个蛋白质-配体复合物和针对 P38alpha 激酶的 12 个蛋白质-配体复合物的对接试验结果。我们还分析了针对 24 种配体的交叉对接试验,每种配体都对接 24 种 HIV 蛋白酶构象,以及针对 24 种配体的受体整体对接试验,每种配体都对接 HIV 蛋白酶受体池。在所有这些对接评估中,我们的方法都证明了在准确识别天然配体几何形状方面,与仅基于能量的评分相比有显著提高。我们的聚类方法的优势使其成为现实世界药物设计工作中复杂应用的有吸引力的选择。我们证明,我们的方法对于使用代表蛋白质构象状态的最小集合(受体整体对接)对对接结果进行聚类特别有用,这是目前解决分子对接中蛋白质柔性的常用策略。