Overton G C, Aaronson J S, Haas J, Adams J
Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia 19104-6145, USA.
J Comput Biol. 1994 Spring;1(1):3-14. doi: 10.1089/cmb.1994.1.3.
We have developed a general system, QGB, for performing complex queries on the information in the DDBJ/EMBL/GenBank databases, including queries over the structural features of sequences implied in the FEATURE TABLE. Queries are formed in a Structured Query Language (SQL)-like syntax with language extensions to support complex types (e.g., sets, ordered sets, and records) appropriate for representing and querying sequence data. A novel aspect of QGB is its ability to deduce missing features and infer relationships among features as a consequence of constructing a parse tree of sequence structure from information described in the FEATURE TABLE. The grammar for the parse tree is implemented in a customized form of the Definite Clause Grammar syntax of the logic programming language Prolog. The logic grammar formalism was chosen because it provides a perspicuous representation for features and constraints, and Prolog provides an execution model for the grammar rules. Construction of the parse tree also identifies inconsistencies and errors in the FEATURE TABLE that can in some cases be corrected automatically and used to generate an augmented version of the table.
我们开发了一个通用系统QGB,用于对DDBJ/EMBL/GenBank数据库中的信息执行复杂查询,包括对FEATURE TABLE中隐含的序列结构特征进行查询。查询采用类似结构化查询语言(SQL)的语法,并通过语言扩展来支持适用于表示和查询序列数据的复杂类型(如集合、有序集和记录)。QGB的一个新颖之处在于,它能够根据FEATURE TABLE中描述的信息构建序列结构的解析树,从而推断出缺失的特征并推断特征之间的关系。解析树的语法以逻辑编程语言Prolog的确定性子句语法的定制形式实现。选择逻辑语法形式主义是因为它为特征和约束提供了清晰的表示,而Prolog为语法规则提供了执行模型。解析树的构建还能识别FEATURE TABLE中的不一致性和错误,在某些情况下可以自动纠正这些错误,并用于生成该表的增强版本。