Bottegoni Giovanni, Rocchia Walter, Recanatini Maurizio, Cavalli Andrea
Department of Pharmaceutical Sciences, University of Bologna, Via Belmeloro 6, I-40126, Bologna, Italy.
Bioinformatics. 2006 Jul 15;22(14):e58-65. doi: 10.1093/bioinformatics/btl212.
Sampling the conformational space is a fundamental step for both ligand- and structure-based drug design. However, the rational organization of different molecular conformations still remains a challenge. In fact, for drug design applications, the sampling process provides a redundant conformation set whose thorough analysis can be intensive, or even prohibitive. We propose a statistical approach based on cluster analysis aimed at rationalizing the output of methods such as Monte Carlo, genetic, and reconstruction algorithms. Although some software already implements clustering procedures, at present, a universally accepted protocol is still missing.
We integrated hierarchical agglomerative cluster analysis with a clusterability assessment method and a user independent cutting rule, to form a global protocol that we implemented in a MATLAB metalanguage program (AClAP). We tested it on the conformational space of a quite diverse set of drugs generated via Metropolis Monte Carlo simulation, and on the poses we obtained by reiterated docking runs performed by four widespread programs. In our tests, AClAP proved to remarkably reduce the dimensionality of the original datasets at a negligible computational cost. Moreover, when applied to the outcomes of many docking programs together, it was able to point to the crystallographic pose.
AClAP is available at the "AClAP" section of the website http://www.scfarm.unibo.it.
对构象空间进行采样是基于配体和基于结构的药物设计的基本步骤。然而,合理组织不同的分子构象仍然是一个挑战。事实上,对于药物设计应用而言,采样过程会提供一个冗余的构象集,对其进行全面分析可能非常耗时,甚至难以实现。我们提出了一种基于聚类分析的统计方法,旨在使蒙特卡罗、遗传算法和重构算法等方法的输出更加合理。尽管一些软件已经实现了聚类程序,但目前仍缺少一个普遍接受的协议。
我们将层次凝聚聚类分析与聚类能力评估方法以及用户独立的切割规则相结合,形成了一个全局协议,并在MATLAB元语言程序(AClAP)中实现。我们在通过 metropolis 蒙特卡罗模拟生成的一组相当多样化的药物的构象空间上对其进行了测试,并在由四个广泛使用的程序进行的多次对接运行所获得的构象上进行了测试。在我们的测试中,AClAP被证明能够以可忽略不计的计算成本显著降低原始数据集的维度。此外,当一起应用于许多对接程序的结果时,它能够指向晶体学构象。