Biochemical Science Division, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.
BMC Bioinformatics. 2011 Dec 21;12:487. doi: 10.1186/1471-2105-12-487.
There are significant challenges associated with the building of ontologies for cell biology experiments including the large numbers of terms and their synonyms. These challenges make it difficult to simultaneously query data from multiple experiments or ontologies. If vocabulary terms were consistently used and reused across and within ontologies, queries would be possible through shared terms. One approach to achieving this is to strictly control the terms used in ontologies in the form of a pre-defined schema, but this approach limits the individual researcher's ability to create new terms when needed to describe new experiments.
Here, we propose the use of a limited number of highly reusable common root terms, and rules for an experimentalist to locally expand terms by adding more specific terms under more general root terms to form specific new vocabulary hierarchies that can be used to build ontologies. We illustrate the application of the method to build vocabularies and a prototype database for cell images that uses a visual data-tree of terms to facilitate sophisticated queries based on a experimental parameters. We demonstrate how the terminology might be extended by adding new vocabulary terms into the hierarchy of terms in an evolving process. In this approach, image data and metadata are handled separately, so we also describe a robust file-naming scheme to unambiguously identify image and other files associated with each metadata value. The prototype database http://sbd.nist.gov/ consists of more than 2000 images of cells and benchmark materials, and 163 metadata terms that describe experimental details, including many details about cell culture and handling. Image files of interest can be retrieved, and their data can be compared, by choosing one or more relevant metadata values as search terms. Metadata values for any dataset can be compared with corresponding values of another dataset through logical operations.
Organizing metadata for cell imaging experiments under a framework of rules that include highly reused root terms will facilitate the addition of new terms into a vocabulary hierarchy and encourage the reuse of terms. These vocabulary hierarchies can be converted into XML schema or RDF graphs for displaying and querying, but this is not necessary for using it to annotate cell images. Vocabulary data trees from multiple experiments or laboratories can be aligned at the root terms to facilitate query development. This approach of developing vocabularies is compatible with the major advances in database technology and could be used for building the Semantic Web.
细胞生物学实验的本体构建存在着巨大的挑战,包括大量的术语及其同义词。这些挑战使得同时查询来自多个实验或本体的数据变得非常困难。如果词汇术语能够在本体之间和内部被一致地使用和重用,那么通过共享术语就可以进行查询。实现这一点的一种方法是通过预定义的模式严格控制本体中使用的术语,但这种方法限制了个体研究人员在需要描述新实验时创建新术语的能力。
在这里,我们建议使用数量有限的高度可重用的通用根术语,并制定规则供实验人员通过在更通用的根术语下添加更具体的术语来局部扩展术语,从而形成特定的新词汇层次结构,可用于构建本体。我们通过使用视觉术语数据树来举例说明该方法在构建词汇和细胞图像原型数据库中的应用,该数据树用于根据实验参数进行复杂的查询。我们演示了如何通过向术语层次结构中添加新词汇术语来扩展术语,这是一个不断发展的过程。在这种方法中,图像数据和元数据是分开处理的,因此我们还描述了一种强大的文件命名方案,以明确识别与每个元数据值相关联的图像和其他文件。原型数据库 http://sbd.nist.gov/ 包含超过 2000 张细胞和基准材料的图像以及 163 个描述实验细节的元数据术语,其中包括许多关于细胞培养和处理的细节。可以通过选择一个或多个相关的元数据值作为搜索词来检索感兴趣的图像文件,并比较它们的数据。可以通过逻辑操作比较任何数据集的元数据值与另一个数据集的对应值。
在包括高度可重用根术语的规则框架下组织细胞成像实验的元数据,将有助于在词汇层次结构中添加新术语并鼓励术语的重用。这些词汇层次结构可以转换为 XML 模式或 RDF 图进行显示和查询,但这对于使用它来注释细胞图像并不是必需的。可以在根术语处对齐来自多个实验或实验室的词汇数据树,以促进查询的开发。这种词汇构建方法与数据库技术的主要进展兼容,可用于构建语义 Web。