Zhang Ziding, Grigorov Martin G
Nestlé Research Center, Nestec Ltd, BioAnalytical Science, CH-1000 Lausanne 26, Switzerland.
Proteins. 2006 Feb 1;62(2):470-8. doi: 10.1002/prot.20752.
An increasing attention has been dedicated to the characterization of complex networks within the protein world. This work is reporting how we uncovered networked structures that reflected the structural similarities among protein binding sites. First, a 211 binding sites dataset has been compiled by removing the redundant proteins in the Protein Ligand Database (PLD) (http://www-mitchell.ch.cam.ac.uk/pld/). Using a clique detection algorithm we have performed all-against-all binding site comparisons among the 211 available ones. Within the set of nodes representing each binding site an edge was added whenever a pair of binding sites had a similarity higher than a threshold value. The generated similarity networks revealed that many nodes had few links and only few were highly connected, but due to the limited data available it was not possible to definitively prove a scale-free architecture. Within the same dataset, the binding site similarity networks were compared with the networks of sequence and fold similarity networks. In the protein world, indications were found that structure is better conserved than sequence, but on its own, sequence was better conserved than the subset of functional residues forming the binding site. Because a binding site is strongly linked with protein function, the identification of protein binding site similarity networks could accelerate the functional annotation of newly identified genes. In view of this we have discussed several potential applications of binding site similarity networks, such as the construction of novel binding site classification databases, as well as the implications for protein molecular design in general and computational chemogenomics in particular.
蛋白质世界中复杂网络的特征已受到越来越多的关注。这项工作报告了我们如何发现反映蛋白质结合位点结构相似性的网络结构。首先,通过去除蛋白质配体数据库(PLD)(http://www-mitchell.ch.cam.ac.uk/pld/)中的冗余蛋白质,编译了一个包含211个结合位点的数据集。使用团簇检测算法,我们对211个可用的结合位点进行了全对全的比较。在表示每个结合位点的节点集中,只要一对结合位点的相似度高于阈值,就添加一条边。生成的相似性网络显示,许多节点连接较少,只有少数节点连接高度密集,但由于可用数据有限,无法明确证明其无标度架构。在同一数据集中,将结合位点相似性网络与序列相似性网络和折叠相似性网络进行了比较。在蛋白质世界中,有迹象表明结构比序列更保守,但就其本身而言,序列比构成结合位点的功能残基子集更保守。由于结合位点与蛋白质功能密切相关,蛋白质结合位点相似性网络的识别可以加速新鉴定基因的功能注释。鉴于此,我们讨论了结合位点相似性网络的几个潜在应用,例如构建新的结合位点分类数据库,以及对一般蛋白质分子设计尤其是计算化学基因组学的意义。