Trissl Silke, Rother Kristian, Müller Heiko, Steinke Thomas, Koch Ina, Preissner Robert, Frömmel Cornelius, Leser Ulf
Institute of Informatics, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany.
BMC Bioinformatics. 2005 Mar 31;6:81. doi: 10.1186/1471-2105-6-81.
Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures.
COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web.
The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.
结构和功能研究通常需要根据蛋白质的某些特性(如序列特征、折叠分类或功能注释)来计算蛋白质结构集。利用当前的网络资源编译此类集合很繁琐,因为必要的数据分散在许多不同的数据库中。为便于完成这项任务,我们创建了COLUMBA,一个蛋白质结构注释的综合数据库。
COLUMBA目前整合了十二个不同的数据库,包括蛋白质数据银行(PDB)、京都基因与基因组百科全书(KEGG)、瑞士蛋白质数据库(Swiss-Prot)、蛋白质结构分类数据库(CATH)、蛋白质结构分类(SCOP)、基因本体论以及酶数据库(ENZYME)。该数据库可以使用关键词搜索或特定数据源的网络表单进行搜索。用户因此可以快速选择并下载例如参与特定途径、被分类为包含某种CATH结构、在基因本体论中被注释为具有某种分子功能且其结构分辨率在定义阈值以下的PDB条目。查询结果以机器可读的可扩展标记语言和人类可读格式提供。结构本身可以在网络上进行交互式查看。
COLUMBA数据库便于为许多基于结构的研究创建蛋白质结构数据集。它允许对目前其他项目未涵盖的多个与结构相关的数据库进行联合查询。因此,可以有效地利用关于多种和少量蛋白质结构的信息。COLUMBA的网络界面可在http://www.columba-db.de获取。