Guhlin Joseph, Silverstein Kevin A T, Zhou Peng, Tiffin Peter, Young Nevin D
Department of Plant and Microbial Biology, 140 Gortner Laboratory, 1479 Gortner Avenue, University of Minnesota, St. Paul, MN, 55108, USA.
Minnesota Supercomputing Institute, 599 Walter Library, 117 Pleasant St. SE, Minneapolis, MN, 55455, USA.
BMC Bioinformatics. 2017 Aug 10;18(1):367. doi: 10.1186/s12859-017-1777-7.
Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data.
The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations.
ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.
近年来,组学数据的快速生成导致大量数据集相互孤立,缺乏系统整合和知识构建,而各个研究小组虽在网络上提供了定制的注释数据集,但将这些数据集与实验室内部数据集相链接的方式却很少。由于众多研究小组都在生成各自的数据,因此将这些数据与更大的基因组和比较基因组背景相关联的能力对于充分利用数据而言变得愈发关键。
组学数据库生成器(ODG)允许用户创建定制数据库,该数据库利用已发表的基因组学数据与实验数据进行整合,并可通过灵活的图形数据库进行查询。当提供组学数据和实验数据时,ODG将创建一个比较性的多维图形数据库。ODG能够从其他来源(如InterProScan、基因本体论、ENZYME、UniPathway等)导入定义和注释。这些注释数据对于研究新的或研究较少的物种(其转录本仅为预测所得)特别有用,能够迅速为预测基因提供额外的注释层。在研究较为充分的物种中,ODG可以进行共线性注释翻译,或快速识别一组基因或核苷酸位置的特征,如关联研究中的命中结果。ODG提供了基于网络的用户界面,用于配置数据导入和查询数据库。查询也可以从命令行运行,并且可以通过适用于大多数语言的编程语言钩子直接查询数据库。ODG支持大多数常见的基因组格式以及通用的、易于使用的制表符分隔值格式,用于用户提供的注释。
ODG是一个用户友好的数据库生成和查询工具,它能根据提供的数据生成比较基因组数据库或多层注释数据库。ODG提供快速的比较基因组注释,因此对于非模式物种或研究较少的物种特别有用。对于有更多数据可用的物种,ODG可用于进行复杂的多组学模式匹配查询。