Shah Sohrab P, Huang Yong, Xu Tao, Yuen Macaire M S, Ling John, Ouellette B F Francis
UBC Bioinformatics Centre, University of British Columbia, Vancouver, BC, Canada.
BMC Bioinformatics. 2005 Feb 21;6:34. doi: 10.1186/1471-2105-6-34.
We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development.
The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations.
The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/
我们展示了一个名为阿特拉斯(Atlas)的生物数据仓库,它在本地存储和整合生物序列、分子相互作用、同源性信息、基因功能注释以及生物本体论。该系统的目标是提供数据以及用于生物信息学研发的软件基础设施。
阿特拉斯系统基于我们为每种源数据类型开发的关系数据模型。存储在这些关系模型中的数据通过在一组应用程序编程接口(API)中实现的结构化查询语言(SQL)调用进行管理。这些API包括三种语言:C++、Java和Perl。这些API库中的方法用于构建一组加载器应用程序,将源数据集解析并加载到阿特拉斯数据库中,以及一组便于数据检索的工具箱应用程序。阿特拉斯存储并整合了GenBank、RefSeq、UniProt、人类蛋白质参考数据库(HPRD)、生物分子相互作用网络数据库(BIND)、相互作用蛋白质数据库(DIP)、分子相互作用数据库(MINT)、IntAct、NCBI分类法、基因本体论(GO)、人类在线孟德尔遗传(OMIM)、基因座链接、Entrez基因和同源基因的本地实例。检索API和工具箱应用程序是关键组件,为最终用户提供了对这些数据灵活、便捷、集成的访问方式。我们展示了使用阿特拉斯整合这些源数据用于基因组注释、跨物种分子相互作用推断以及基因-疾病关联的用例。
阿特拉斯生物数据仓库作为生物信息学研发的数据基础设施。它构成了我们实验室研究活动的支柱,并促进了不同的、异构的生物数据源的整合,从而实现新的科学推断。阿特拉斯在两个层面实现了不同数据集的整合。首先,阿特拉斯使用通用数据模型存储相似类型的数据,强化数据类型之间的关系。其次,通过API、本体论和工具的组合实现整合。阿特拉斯软件可根据GNU通用公共许可证在以下网址免费获取:http://bioinformatics.ubc.ca/atlas/