Benton D
National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892-6050, USA.
SAR QSAR Environ Res. 1998;8(3-4):121-55. doi: 10.1080/10629369808039138.
Due to the high rate of data production and the need of researchers to have rapid access to new data, public databases have become the major medium through which genome mapping and sequencing data as well as macromolecular structural data are published. There are now more than 250 databases of biomolecular, structural, genetic, or phenotypic data, many of which are doubling in size annually. These databases, many of which were created and are maintained by experimentalists for their own research use, provide valuable collections of organized, validated data. However, the very number and diversity of databases now make efficient data resource discovery as important as effective data resource use. Existing autonomous biological databases contain related data which are more valuable when interconnected than when isolated. Political and scientific realities dictate that these databases will be built by different teams, in different locations, for different purposes, and using different data models and supporting DBMSs. As a consequence, connecting the related data they contain is not straightforward. Experience with existing biological databases indicates that it is possible to form useful queries across these databases, but that doing so usually requires expertise in the semantic structure of each source database. Advancing to the next level of integration among biological information resources poses significant technical and sociological challenges.
由于数据生成率很高,且研究人员需要快速获取新数据,公共数据库已成为发布基因组图谱和测序数据以及大分子结构数据的主要媒介。目前有250多个生物分子、结构、遗传或表型数据的数据库,其中许多数据库的规模每年都在翻番。这些数据库中有许多是由实验人员创建并维护以供自己研究使用的,它们提供了经过整理和验证的有价值的数据集合。然而,如今数据库的数量和多样性使得高效的数据资源发现与有效的数据资源利用同样重要。现有的自主生物数据库包含相关数据,这些数据相互连接时比孤立时更有价值。政治和科学现实决定了这些数据库将由不同的团队在不同的地点出于不同的目的并使用不同的数据模型和支持的数据库管理系统来构建。因此,连接它们所包含的相关数据并非易事。现有生物数据库的经验表明,可以在这些数据库之间形成有用的查询,但这样做通常需要了解每个源数据库的语义结构方面的专业知识。向生物信息资源的下一级集成迈进面临重大的技术和社会学挑战。