Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
Genomics. 2012 Jul;100(1):1-7. doi: 10.1016/j.ygeno.2012.05.006. Epub 2012 May 17.
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services.
随着高通量技术的普及,基因组水平数据分析在分子生物学中已变得非常普遍。生物信息学家正在开发广泛的资源,以从高通量数据中注释和挖掘生物特征。大多数生物信息学软件的基础数据库管理系统都基于关系模型。现代非关系数据库提供了一种替代方案,具有灵活性、可扩展性和非刚性设计模式。此外,随着发展步伐的加快,像 CouchDB 这样的非关系数据库可以成为构建生物信息学实用程序的理想工具。我们通过介绍三个新的生物信息学资源来描述 CouchDB:(a) geneSmash,它整理来自生物信息学资源的数据,并提供自动化的以基因为中心的注释,(b) drugBase,一个药物-靶标相互作用数据库,带有由 geneSmash 提供支持的 Web 界面,以及 (c) HapMap-CN,它提供了一个从三个 SNP 芯片 HapMap 数据集查询拷贝数变异的 Web 界面。除了网站之外,所有三个系统都可以通过 Web 服务以编程方式访问。