Shindyalov I N, Chang W, Pu C, Bourne P E
Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032.
Protein Eng. 1994 Nov;7(11):1311-22. doi: 10.1093/protein/7.11.1311.
Macromolecular query language (MMQL) is an extensible interpretive language in which to pose questions concerning the experimental or derived features of the 3-D structure of biological macromolecules. MMQL portends to be intuitive with a simple syntax, so that from a user's perspective complex queries are easily written. A number of basic queries and a more complex query--determination of structures containing a five-strand Greek key motif--are presented to illustrate the strengths and weaknesses of the language. The predominant features of MMQL are a filter and pattern grammar which are combined to express a wide range of interesting biological queries. Filters permit the selection of object attributes, for example, compound name and resolution, whereas the patterns currently implemented query primary sequence, close contacts, hydrogen bonding, secondary structure, conformation and amino acid properties (volume, polarity, isoelectric point, hydrophobicity and different forms of exposure). MMQL queries are processed by MMQLlib; a C++ class library, to which new query methods and pattern types are easily added. The prototype implementation described uses PDBlib, another C(++)-based class library from representing the features of biological macromolecules at the level of detail parsable from a PDB file. Since PDBlib can represent data stored in relational and object-oriented databases, as well as PDB files, once these data are loaded they too can be queried by MMQL. Performance metrics are given for queries of PDB files for which all derived data are calculated at run time and compared to a preliminary version of OOPDB, a prototype object-oriented database with a schema based on a persistent version of PDBlib which offers more efficient data access and the potential to maintain derived information. MMQLlib, PDBlib and associated software are available via anonymous ftp from cuhhca.hhmi.columbia.edu.
大分子查询语言(MMQL)是一种可扩展的解释性语言,用于提出有关生物大分子三维结构的实验特征或派生特征的问题。MMQL旨在具有直观的简单语法,以便从用户角度轻松编写复杂查询。本文给出了一些基本查询以及一个更复杂的查询——确定包含五链希腊钥匙基序的结构,以说明该语言的优缺点。MMQL的主要特征是一个过滤器和模式语法,它们结合起来可表达各种有趣的生物学查询。过滤器允许选择对象属性,例如化合物名称和分辨率,而当前实现的模式可查询一级序列、紧密接触、氢键、二级结构、构象和氨基酸属性(体积、极性、等电点、疏水性和不同形式的暴露)。MMQL查询由MMQLlib(一个C++类库)处理,新的查询方法和模式类型可轻松添加到该类库中。所描述的原型实现使用了PDBlib,它是另一个基于C(++)的类库,用于从PDB文件可解析的详细程度来表示生物大分子的特征。由于PDBlib可以表示存储在关系型和面向对象数据库以及PDB文件中的数据,一旦加载这些数据,它们也可以由MMQL进行查询。文中给出了对PDB文件查询的性能指标,对于这些查询,所有派生数据在运行时计算,并与OOPDB的初步版本进行比较,OOPDB是一个原型面向对象数据库,其模式基于PDBlib的持久版本,提供更高效的数据访问并具有维护派生信息的潜力。MMQLlib、PDBlib及相关软件可通过匿名ftp从cuhhca.hhmi.columbia.edu获得。