Nebel Jean-Christophe, Herzyk Pawel, Gilbert David R
Faculty of Computing, Information Systems & Mathematics, Kingston University, Kingston-upon-Thames, KT1 2EE, UK.
BMC Bioinformatics. 2007 Aug 30;8:321. doi: 10.1186/1471-2105-8-321.
Since many of the new protein structures delivered by high-throughput processes do not have any known function, there is a need for structure-based prediction of protein function. Protein 3D structures can be clustered according to their fold or secondary structures to produce classes of some functional significance. A recent alternative has been to detect specific 3D motifs which are often associated to active sites. Unfortunately, there are very few known 3D motifs, which are usually the result of a manual process, compared to the number of sequential motifs already known. In this paper, we report a method to automatically generate 3D motifs of protein structure binding sites based on consensus atom positions and evaluate it on a set of adenine based ligands.
Our new approach was validated by generating automatically 3D patterns for the main adenine based ligands, i.e. AMP, ADP and ATP. Out of the 18 detected patterns, only one, the ADP4 pattern, is not associated with well defined structural patterns. Moreover, most of the patterns could be classified as binding site 3D motifs. Literature research revealed that the ADP4 pattern actually corresponds to structural features which show complex evolutionary links between ligases and transferases. Therefore, all of the generated patterns prove to be meaningful. Each pattern was used to query all PDB proteins which bind either purine based or guanine based ligands, in order to evaluate the classification and annotation properties of the pattern. Overall, our 3D patterns matched 31% of proteins with adenine based ligands and 95.5% of them were classified correctly.
A new metric has been introduced allowing the classification of proteins according to the similarity of atomic environment of binding sites, and a methodology has been developed to automatically produce 3D patterns from that classification. A study of proteins binding adenine based ligands showed that these 3D patterns are not only biochemically meaningful, but can be used for protein classification and annotation.
由于高通量方法所得到的许多新蛋白质结构尚无任何已知功能,因此需要基于结构预测蛋白质功能。蛋白质的三维结构可根据其折叠或二级结构进行聚类,以产生具有一定功能意义的类别。最近的一种替代方法是检测通常与活性位点相关的特定三维基序。不幸的是,与已知的序列基序数量相比,已知的三维基序非常少,这些三维基序通常是人工过程的结果。在本文中,我们报告了一种基于共有原子位置自动生成蛋白质结构结合位点三维基序的方法,并在一组基于腺嘌呤的配体上对其进行了评估。
我们的新方法通过自动生成基于主要腺嘌呤的配体(即AMP、ADP和ATP)的三维模式得到了验证。在检测到的18种模式中,只有一种,即ADP4模式,与明确的结构模式无关。此外,大多数模式可归类为结合位点三维基序。文献研究表明,ADP4模式实际上对应于连接酶和转移酶之间显示复杂进化联系的结构特征。因此,所有生成的模式都证明是有意义的。每个模式都用于查询所有结合基于嘌呤或鸟嘌呤配体的PDB蛋白质,以评估模式的分类和注释特性。总体而言,我们的三维模式与31%的基于腺嘌呤配体的蛋白质匹配,其中95.5%被正确分类。
引入了一种新的度量标准,允许根据结合位点原子环境的相似性对蛋白质进行分类,并开发了一种方法来根据该分类自动生成三维模式。对结合基于腺嘌呤配体的蛋白质的研究表明,这些三维模式不仅在生化上有意义,而且可用于蛋白质分类和注释。