Catanho Marcos, Mascarenhas Daniel, Degrave Wim, Miranda Antonio Basílio de
Departamento de Bioquímica e Biologia Molecular, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, RJ, Brazil.
Genet Mol Res. 2006 Mar 31;5(1):115-26.
Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.
已经创建了几个数据库和计算工具,目的是组织、整合和分析由分枝杆菌基因组及其他生物体的大规模测序项目产生的大量信息。然而,除了极少数例外情况,这些数据库和工具不允许对这些数据进行大规模和/或动态比较。GenoMycDB(http://www.dbbm.fiocruz.br/GenoMycDB)是一个关系数据库,基于预测的蛋白质含量,用于对完全测序的分枝杆菌基因组进行大规模比较分析。它的核心结构由六种分枝杆菌基因组编码的所有预测蛋白质之间的成对序列比对后获得的结果组成:结核分枝杆菌(菌株H37Rv和CDC1551)、牛分枝杆菌AF2122/97、鸟分枝杆菌副结核亚种K10、麻风分枝杆菌TN和耻垢分枝杆菌MC2 155。该数据库存储了每对比对的计算相似性参数,为每个蛋白质序列提供预测亚细胞定位、指定的直系同源群簇、相应基因的特征以及与几个重要数据库的链接。可以根据用户定义的标准,基于一个或多个序列相似性参数,动态生成包含选定物种/菌株之间潜在同源物对或组的表格。此外,可以根据蛋白质的预测亚细胞定位、相应基因的DNA链和/或蛋白质描述来限制搜索。提供大规模数据搜索和/或检索,并提供不同的结果导出方式。GenoMycDB为分枝杆菌蛋白质的功能分类以及基因组结构、组织和进化分析提供了一个在线资源。