Hart K W, Searls D B, Overton G C
Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia 19104-6145.
Comput Appl Biosci. 1994 Jul;10(4):369-78. doi: 10.1093/bioinformatics/10.4.369.
The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
美国国家生物技术信息中心(NCBI)创建了一个数据库集合,其中包括几个蛋白质和核酸序列数据库、MEDLINE的生物序列特定子集,以及诸如相似序列之间的链接等增值信息。NCBI数据库中的信息采用抽象语法表示法1(ASN.1)进行建模,ASN.1是一种开放系统互连协议,旨在用于在软件应用程序之间交换结构化数据,而非作为数据库系统的数据模型。虽然NCBI数据库随附了一个易于使用的信息检索系统ENTREZ,但ASN.1数据模型目前缺乏用于通用数据访问的临时查询语言。因此,我们开发了一个软件包SORTEZ,它将ASN.1数据库(或其他具有嵌套数据结构的数据库)转换为关系数据模型,随后再转换为关系数据库管理系统(Sybase),在该系统中可以通过关系查询语言SQL访问信息。由于在几个重要的场景中,自然而然会出现将数据从一种数据模型和模式转换为另一种数据模型和模式的需求,包括高效执行特定应用程序、访问多个数据库以及适应数据库的演变,因此这项工作也可作为对数据库转换各个阶段所涉及问题的实际研究。我们表明,从ASN.1数据模型到关系数据模型的转换在很大程度上可以自动化,但模式转换和数据转换需要相当多的领域专业知识,并且会从额外的支持工具中大大受益。