Department of Biochemistry and Biophysics, The Johnson Research Foundation, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
J Chem Inf Model. 2010 Apr 26;50(4):589-603. doi: 10.1021/ci900397t.
The shape of the protein surface dictates what interactions are possible with other macromolecules, but defining discrete pockets or possible interaction sites remains difficult. First, there is the problem of defining the extent of the pocket. Second, one has to characterize the shape of each pocket. Third, one needs to make quantitative comparisons between pockets on different proteins. An elegant solution to these problems is to sort all surface and solvent points by travel depth and then collect a hierarchical tree of pockets. The connectivity of the tree is determined via the deepest saddle points between each pair of neighboring pockets. The resulting pocket surfaces tessellate the entire protein surface, producing a complete inventory of pockets. This method of identifying pockets also allows one to easily compute important shape metrics, including the problematic pocket volume, surface area, and mouth size. Pockets are also annotated with their lining residue lists and polarity and with other residue-based properties. Using this tree and the various shape metrics pockets can be merged, grouped, or filtered for further analysis. Since this method includes the entire surface, it guarantees that any pocket of interest will be found among the output pockets, unlike all previous methods of pocket identification. The resulting hierarchy of pockets is easy to visualize and aids users in higher level analysis. Comparison of pockets is done by using the shape metrics, avoiding the complex shape alignment problem. Example applications show that the method facilitates pocket comparison along mutational or time-dependent series. Pockets from families of proteins can be examined using multiple pocket tree alignments to see how ligand binding sites or how other pockets have changed with evolution. Our method is called CLIPPERS for complete liberal inventory of protein pockets elucidating and reporting on shape.
蛋白质表面的形状决定了它与其他大分子可能发生的相互作用,但确定离散的口袋或可能的相互作用位点仍然很困难。首先,存在定义口袋范围的问题。其次,必须描述每个口袋的形状。第三,需要对不同蛋白质上的口袋进行定量比较。解决这些问题的一种优雅方法是按行进深度对所有表面和溶剂点进行排序,然后收集口袋的分层树。树的连通性通过每个相邻口袋之间最深的鞍点来确定。由此产生的口袋表面将整个蛋白质表面细分,生成口袋的完整清单。这种识别口袋的方法还允许轻松计算重要的形状度量,包括有问题的口袋体积、表面积和口大小。口袋还带有其衬里残基列表以及极性和其他基于残基的属性进行注释。使用此树和各种形状度量,口袋可以合并、分组或过滤以进行进一步分析。由于这种方法包括整个表面,因此可以保证在输出口袋中找到任何感兴趣的口袋,与以前所有的口袋识别方法不同。口袋的层次结构易于可视化,并有助于用户进行更高层次的分析。通过使用形状度量来比较口袋,避免了复杂的形状对齐问题。示例应用表明,该方法可方便地沿突变或时间相关序列进行口袋比较。可以使用多个口袋树比对来检查蛋白质家族的口袋,以了解配体结合位点或其他口袋如何随进化而变化。我们的方法称为 CLIPPERS,用于全面揭示和报告蛋白质口袋的形状。