Wei L, Altman R B
Section on Medical Informatics, Stanford University, CA 94305-5479, USA.
Pac Symp Biocomput. 1998:497-508.
We have developed a new method for recognizing sites in three-dimensional protein structures. Our method is based on our previously reported algorithm for creating descriptions of protein microenvironments using physical and chemical properties at multiple levels of detail (including features at the atomic, chemical group, residue, and secondary structural levels). The recognition method takes three inputs: a set of sites that share some structural or functional role, a set of control nonsites that lack this role, and a single query site. The values of properties for the query site are compared to the distributions of values for both sites and nonsites to determine the group to which it is most similar. A log-odds scoring function, based on Bayes' Rule, computes a score that indicates the likelihood that the query region is a site of interest. In this paper, we apply the method to the task of identifying calcium binding sites in proteins. Cross-validation analysis shows that this recognition approach has high sensitivity and specificity. We also describe the results of scanning four calcium binding proteins (with the calcium removed) using a three-dimensional grid of probe points at 2 A spacing. The probe points that have high scores cluster around the true calcium binding sites, with the highest scoring points at or near the binding sites. The method fails in only one case where a calcium binding site is created by four proteins in the crystal lattice, and is thus not recognizable within the crystallographic asymmetric unit. Our results show that property-based descriptions can be used for recognizing protein sites in unannotated structures.
我们开发了一种识别三维蛋白质结构中位点的新方法。我们的方法基于我们之前报道的算法,该算法利用多个详细层次的物理和化学性质(包括原子、化学基团、残基和二级结构层次的特征)来创建蛋白质微环境的描述。识别方法有三个输入:一组具有某种结构或功能作用的位点、一组缺乏该作用的对照非位点以及一个单一的查询位点。将查询位点的性质值与位点和非位点的值分布进行比较,以确定它最相似的组。基于贝叶斯规则的对数几率评分函数计算一个分数,该分数表明查询区域是感兴趣位点的可能性。在本文中,我们将该方法应用于识别蛋白质中钙结合位点的任务。交叉验证分析表明,这种识别方法具有高灵敏度和特异性。我们还描述了使用间距为2埃的三维探针点网格扫描四种钙结合蛋白(去除钙)的结果。得分高的探针点聚集在真正的钙结合位点周围,得分最高的点位于结合位点处或附近。该方法仅在一种情况下失败,即晶格中的四种蛋白质形成了一个钙结合位点,因此在晶体学不对称单元内无法识别。我们的结果表明,基于性质的描述可用于识别未注释结构中的蛋白质位点。