Chemical Computing Group, Inc., 1010 Sherbrooke Street West, Suite 910, Montreal, Quebec, Canada H3A 2R7.
J Chem Inf Model. 2010 Aug 23;50(8):1466-75. doi: 10.1021/ci100210c.
A novel method for measuring protein pocket similarity was devised, using only the alpha carbon positions of the pocket residues. Pockets were compared pairwise using an exhaustive three-dimensional Calpha common subset search, grouping residues by physicochemical properties. At least five Calpha matches were required for each hit, and distances between corresponding points were fit to an Extreme Value Distribution resulting in a probabilistic score or likelihood for any given superposition. A set of 85 structures from 13 diverse protein families was clustered based on binding sites alone, using this score. It was also successfully used to cluster 25 kinases into a number of subfamilies. Using a test kinase query to retrieve other kinase pockets, it was found that a specificity of 99.2% and sensitivity of 97.5% could be achieved using an appropriate cutoff score. The search itself took from 2 to 10 min on a single 3.4 GHz CPU to search the entire Protein Data Bank (133 800 pockets), depending on the number of hits returned.
设计了一种新的方法来测量蛋白质口袋的相似性,仅使用口袋残基的α碳原子位置。使用详尽的三维 Calpha 公共子集搜索,根据物理化学性质对口袋进行两两比较,将残基分组。每个命中至少需要五个 Calpha 匹配,并且对应点之间的距离适合极值分布,从而为任何给定的叠加产生概率评分或可能性。使用此评分,根据结合位点对来自 13 个不同蛋白质家族的 85 个结构进行了聚类。还成功地将 25 个激酶聚类成多个亚家族。使用测试激酶查询检索其他激酶口袋,发现使用适当的截止评分可以达到 99.2%的特异性和 97.5%的敏感性。搜索本身在单个 3.4 GHz CPU 上需要 2 到 10 分钟,具体取决于返回的命中数量,以搜索整个蛋白质数据库(133800 个口袋)。