Laboratory of Genome Informatics, National Institute for Basic Biology, National Institutes of Natural Sciences, Nishigonaka 38, Myodaiji, Okazaki, Aichi 444-8585, Japan.
Data Integration and Analysis Facility, National Institute for Basic Biology, National Institutes of Natural Sciences, Nishigonaka 38, Myodaiji, Okazaki, Aichi 444-8585, Japan.
Nucleic Acids Res. 2019 Jan 8;47(D1):D382-D389. doi: 10.1093/nar/gky1054.
The Microbial Genome Database for Comparative Analysis (MBGD) is a database for comparative genomics based on comprehensive orthology analysis of bacteria, archaea and unicellular eukaryotes. MBGD now contains 6318 genomes. To utilize the database for both closely related and distantly related genomes, MBGD previously provided two types of ortholog tables: the standard ortholog table containing one representative genome from each genus covering the entire taxonomic range and the taxon specific ortholog tables for each taxon. However, this approach has a drawback in that the standard ortholog table contains only genes that are conserved in the representative genomes. To address this problem, we developed a stepwise procedure to construct ortholog tables hierarchically in a bottom-up manner. By using this approach, the new standard ortholog table now covers the entire gene repertoire stored in MBGD. In addition, we have enhanced several functionalities, including rapid and flexible keyword searching, profile-based sequence searching for orthology assignment to a user query sequence, and displaying a phylogenetic tree of each taxon based on the concatenated core gene sequences. For integrative database searching, the core data in MBGD are represented in Resource Description Framework (RDF) and a SPARQL interface is provided to search them. MBGD is available at http://mbgd.genome.ad.jp/.
微生物基因组比较分析数据库(MBGD)是一个基于细菌、古菌和单细胞真核生物综合同源性分析的比较基因组学数据库。MBGD 现在包含 6318 个基因组。为了利用该数据库进行密切相关和远距离相关的基因组分析,MBGD 之前提供了两种类型的同源物表:标准同源物表,其中包含涵盖整个分类范围的每个属的一个代表基因组;以及针对每个分类单元的分类单元特异性同源物表。然而,这种方法有一个缺点,即标准同源物表仅包含在代表基因组中保守的基因。为了解决这个问题,我们开发了一种逐步的方法,以自下而上的方式分层构建同源物表。通过使用这种方法,新的标准同源物表现在涵盖了 MBGD 中存储的整个基因库。此外,我们还增强了几个功能,包括快速灵活的关键字搜索、基于轮廓的序列搜索,以将同源物分配给用户查询序列,以及根据连接的核心基因序列显示每个分类单元的系统发育树。为了进行综合数据库搜索,MBGD 中的核心数据以资源描述框架(RDF)表示,并提供了 SPARQL 接口来搜索它们。MBGD 可在 http://mbgd.genome.ad.jp/ 获得。