de Alarcón P A, Gupta A, Carazo J M
Centro Nacional de Biotecnología-CSIC, Campus Universidad Autonoma, Cantoblanco, Madrid, 28049, Spain.
J Struct Biol. 1999 Apr-May;125(2-3):112-22. doi: 10.1006/jsbi.1999.4102.
Nowadays we are experiencing a remarkable growth in the number of databases that have become accessible over the Web. However, in a certain number of cases, for example, in the case of BioImage, this information is not of a textual nature, thus posing new challenges in the design of tools to handle these data. In this work, we concentrate on the development of new mechanisms aimed at "querying" these databases of complex data sets by their intrinsic content, rather than by their textual annotations only. We concentrate our efforts on a subset of BioImage containing 3D images (volumes) of biological macromolecules, implementing a first prototype of a "query-by-content" system. In the context of databases of complex data types the term query-by-content makes reference to those data modeling techniques in which user-defined functions aim at "understanding" (to some extent) the informational content of the data sets. In these systems the matching criteria introduced by the user are related to intrinsic features concerning the 3D images themselves, hence, complementing traditional queries by textual key words only. Efficient computational algorithms are required in order to "extract" structural information of the 3D images prior to storing them in the database. Also, easy-to-use interfaces should be implemented in order to obtain feedback from the expert. Our query-by-content prototype is used to construct a concrete query, making use of basic structural features, which are then evaluated over a set of three-dimensional images of biological macromolecules. This experimental implementation can be accessed via the Web at the BioImage server in Madrid, at http://www.bioimage.org/qbc/index.html.
如今,通过网络可访问的数据库数量正在显著增长。然而,在某些情况下,例如生物图像(BioImage)的情况,这些信息并非文本性质,因此在处理这些数据的工具设计方面带来了新的挑战。在这项工作中,我们专注于开发新机制,旨在通过复杂数据集的内在内容而非仅通过其文本注释来“查询”这些数据库。我们将精力集中在生物图像的一个子集上,该子集包含生物大分子的3D图像(体积数据),并实现了一个“基于内容查询”系统的首个原型。在复杂数据类型的数据库背景下,“基于内容查询”一词指的是那些数据建模技术,其中用户定义的函数旨在(在某种程度上)“理解”数据集的信息内容。在这些系统中,用户引入的匹配标准与3D图像本身的内在特征相关,因此,对仅通过文本关键词的传统查询起到补充作用。在将3D图像存储到数据库之前,需要高效的计算算法来“提取”其结构信息。此外,还应实现易于使用的界面以便从专家那里获得反馈。我们的基于内容查询原型用于构建一个具体查询,利用基本结构特征,然后在一组生物大分子的三维图像上进行评估。该实验实现可通过马德里生物图像服务器的网络访问,网址为http://www.bioimage.org/qbc/index.html。