An Jianghong, Totrov Maxim, Abagyan Ruben
Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.
Genome Inform. 2004;15(2):31-41.
We have developed a new computational algorithm for de novo identification of protein-ligand binding pockets and performed a large-scale validation of the algorithm on two systematically collected datasets from all crystallographic structures in the Protein Data Bank (PDB). This algorithm, called DrugSite, takes a three-dimensional protein structure as input and returns the location, volume and shape of the putative small molecule binding sites by using a physical potential and without any knowledge about a potential ligand molecule. We validated this method using 17,126 binding sites from complexes and apo-structures from the PDB. Out of 5,616 binding sites from protein-ligand complexes, 98.8% were identified by predicted pockets. In proteins having known binding sites, 80.9% were predicted by the largest predicted pocket and 92.7% by the first two. The average ratio of predicted contact area to the total surface area of the protein was 4.7% for the predicted pockets. In only 1.2% of the cases, no "pocket density" was found at the ligand location. Further, 98.6% of 11,510 binding sites collected from apo-structures were predicted. The algorithm is accurate and fast enough to predict protein-ligand binding sites of uncharacterized protein structures, suggest new allosteric druggable pockets, evaluate druggability of protein-protein interfaces and prioritize molecular targets by druggability. Furthermore, the known and the predicted binding pockets for the proteome of a particular organism can be clustered into a "pocketome", that can be used for rapid evaluation of possible binding partners of a given chemical compound.
我们开发了一种用于从头识别蛋白质-配体结合口袋的新计算算法,并在从蛋白质数据库(PDB)中所有晶体结构系统收集的两个数据集上对该算法进行了大规模验证。这种名为DrugSite的算法以三维蛋白质结构作为输入,并通过使用物理势且无需任何关于潜在配体分子的知识,返回假定的小分子结合位点的位置、体积和形状。我们使用来自PDB的复合物和无配体结构的17126个结合位点对该方法进行了验证。在来自蛋白质-配体复合物的5616个结合位点中,98.8%被预测口袋识别。在具有已知结合位点的蛋白质中,80.9%被最大的预测口袋预测到,92.7%被前两个预测口袋预测到。预测口袋的预测接触面积与蛋白质总表面积的平均比例为4.7%。在仅1.2%的情况下,在配体位置未发现“口袋密度”。此外,从无配体结构收集的11510个结合位点中有98.6%被预测到。该算法足够准确和快速,能够预测未表征蛋白质结构的蛋白质-配体结合位点,提示新的变构可成药口袋,评估蛋白质-蛋白质界面的可成药性,并按可成药性对分子靶点进行优先级排序。此外,特定生物体蛋白质组的已知和预测结合口袋可以聚类成一个“口袋组”,可用于快速评估给定化合物可能的结合伙伴。