生物医学中的异构数据库集成

Heterogeneous database integration in biomedicine.

作者信息

Sujansky W

机构信息

ePocrates, Inc., 1927 Eaton Avenue, San Carlos, California 94070, USA.

出版信息

J Biomed Inform. 2001 Aug;34(4):285-98. doi: 10.1006/jbin.2001.1024.

DOI:10.1006/jbin.2001.1024

PMID:11977810

Abstract

The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.

摘要

生物医学知识的迅速增长、计算成本的降低以及互联网接入的普及产生了海量的电子数据。然而，我们科学界和医疗系统的分散性质导致了各种不同的或异构的数据库实现方式拼凑在一起，使得跨数据库访问和汇总数据变得非常困难。数据库异构问题同样适用于描述个体患者的临床数据和表征我们基因组的生物数据。具体而言，数据库在它们所采用的数据模型、所指定的数据模式、所支持的查询语言以及所认可的术语方面存在高度异构性。异构数据库系统试图通过提供解决表示异构性的统一概念模式，以及提供汇总和集成分布式数据的查询功能来统一不同的数据库。该领域的研究应用了各种数据库和基于知识的技术，包括语义数据建模、本体定义、查询翻译、查询优化和术语映射。现有系统已经在分子生物学、医院信息系统和应用可移植性领域解决了异构数据库集成问题。