State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Zhangjiang Hi-Tech Park, Pudong, Shanghai, 201203, PR China.
BMC Bioinformatics. 2010 Jan 25;11:47. doi: 10.1186/1471-2105-11-47.
Genome sequencing and post-genomics projects such as structural genomics are extending the frontier of the study of sequence-structure-function relationship of genes and their products. Although many sequence/structure-based methods have been devised with the aim of deciphering this delicate relationship, there still remain large gaps in this fundamental problem, which continuously drives researchers to develop novel methods to extract relevant information from sequences and structures and to infer the functions of newly identified genes by genomics technology.
Here we present an ultrafast method, named BSSF(Binding Site Similarity & Function), which enables researchers to conduct similarity searches in a comprehensive three-dimensional binding site database extracted from PDB structures. This method utilizes a fingerprint representation of the binding site and a validated statistical Z-score function scheme to judge the similarity between the query and database items, even if their similarities are only constrained in a sub-pocket. This fingerprint based similarity measurement was also validated on a known binding site dataset by comparing with geometric hashing, which is a standard 3D similarity method. The comparison clearly demonstrated the utility of this ultrafast method. After conducting the database searching, the hit list is further analyzed to provide basic statistical information about the occurrences of Gene Ontology terms and Enzyme Commission numbers, which may benefit researchers by helping them to design further experiments to study the query proteins.
This ultrafast web-based system will not only help researchers interested in drug design and structural genomics to identify similar binding sites, but also assist them by providing further analysis of hit list from database searching.
基因组测序和结构基因组学等后基因组学项目正在拓展基因及其产物的序列-结构-功能关系研究的前沿。尽管已经设计了许多基于序列/结构的方法,旨在破解这个微妙的关系,但在这个基本问题上仍然存在很大的差距,这不断促使研究人员开发新的方法,从序列和结构中提取相关信息,并通过基因组学技术推断新鉴定基因的功能。
在这里,我们提出了一种超快的方法,称为 BSSF(结合位点相似性和功能),它使研究人员能够在从 PDB 结构中提取的全面的三维结合位点数据库中进行相似性搜索。该方法利用结合位点的指纹表示和经过验证的统计 Z 分数函数方案来判断查询和数据库项之间的相似性,即使它们的相似性仅局限于子口袋中。这种基于指纹的相似性测量方法也通过与几何哈希(一种标准的 3D 相似性方法)进行比较,在已知的结合位点数据集上进行了验证。比较清楚地证明了这种超快方法的实用性。在进行数据库搜索后,进一步分析命中列表,提供关于基因本体论术语和酶委员会编号出现的基本统计信息,这可能有助于研究人员通过帮助他们设计进一步的实验来研究查询蛋白质。
这个超快的基于网络的系统不仅将帮助对药物设计和结构基因组学感兴趣的研究人员识别相似的结合位点,还可以通过对数据库搜索命中列表进行进一步分析来为他们提供帮助。