Informatics Institute, University of Missouri, Columbia, MO 65211, USA.
Bioinformatics. 2012 May 15;28(10):1345-52. doi: 10.1093/bioinformatics/bts138. Epub 2012 Apr 6.
Finding geometrically similar protein binding sites is crucial for understanding protein functions and can provide valuable information for protein-protein docking and drug discovery. As the number of known protein-protein interaction structures has dramatically increased, a high-throughput and accurate protein binding site comparison method is essential. Traditional alignment-based methods can provide accurate correspondence between the binding sites but are computationally expensive.
In this article, we present a novel method for the comparisons of protein binding sites using a 'visual words' representation (PBSword). We first extract geometric features of binding site surfaces and build a vocabulary of visual words by clustering a large set of feature descriptors. We then describe a binding site surface with a high-dimensional vector that encodes the frequency of visual words, enhanced by the spatial relationships among them. Finally, we measure the similarity of binding sites by utilizing metric space operations, which provide speedy comparisons between protein binding sites. Our experimental results show that PBSword achieves a comparable classification accuracy to an alignment-based method and improves accuracy of a feature-based method by 36% on a non-redundant dataset. PBSword also exhibits a significant efficiency improvement over an alignment-based method.
发现几何相似的蛋白质结合位点对于理解蛋白质功能至关重要,并且可以为蛋白质-蛋白质对接和药物发现提供有价值的信息。随着已知蛋白质-蛋白质相互作用结构数量的急剧增加,高通量且准确的蛋白质结合位点比较方法至关重要。传统的基于比对的方法可以在结合位点之间提供准确的对应关系,但计算成本很高。
在本文中,我们提出了一种使用“视觉词汇”表示法(PBSword)比较蛋白质结合位点的新方法。我们首先提取结合位点表面的几何特征,并通过聚类大量特征描述符来构建视觉词汇的词汇表。然后,我们使用一种高维向量来描述结合位点表面,该向量通过它们之间的空间关系增强了视觉词汇的频率。最后,我们通过利用度量空间操作来测量结合位点的相似性,这为蛋白质结合位点之间的快速比较提供了支持。我们的实验结果表明,PBSword 在非冗余数据集上实现了与基于比对的方法相当的分类准确性,并将基于特征的方法的准确性提高了 36%。PBSword 还显著提高了与基于比对的方法相比的效率。