Hill David P, Blake Judith A, Richardson Joel E, Ringwald Martin
Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine 04609, USA.
Genome Res. 2002 Dec;12(12):1982-91. doi: 10.1101/gr.580102.
Structured vocabulary development enhances the management of information in biological databases. As information grows, handling the complexity of vocabularies becomes difficult. Defined methods are needed to manipulate, expand and integrate complex vocabularies. The Gene Ontology (GO) project provides the scientific community with a set of structured vocabularies to describe domains of molecular biology. The vocabularies are used for annotation of gene products and for computational annotation of sequence data sets. The vocabularies focus on three concepts universal to living systems, biological process, molecular function and cellular component. As the vocabularies expand to incorporate terms needed by diverse annotation communities, species-specific terms become problematic. In particular, the use of species-specific anatomical concepts remains unresolved. We present a method for expansion of GO into areas outside of the three original universal concept domains. We combine concepts from two orthogonal vocabularies to generate a larger, more specific vocabulary. The example of mammalian heart development is presented because it addresses two issues that challenge GO; inclusion of organism-specific anatomical terms, and proliferation of terms and relationships. The combination of concepts from orthogonal vocabularies provides a robust representation of relevant terms and an opportunity for evaluation of hypothetical concepts.
结构化词汇发展增强了生物数据库中的信息管理。随着信息的增长,处理词汇的复杂性变得困难。需要定义方法来操纵、扩展和整合复杂的词汇。基因本体论(GO)项目为科学界提供了一套结构化词汇,用于描述分子生物学领域。这些词汇用于基因产物的注释和序列数据集的计算注释。这些词汇集中于生命系统、生物过程、分子功能和细胞成分这三个通用概念。随着词汇扩展以纳入不同注释群体所需的术语,物种特异性术语变得成问题。特别是,物种特异性解剖学概念的使用仍未得到解决。我们提出了一种将GO扩展到三个原始通用概念领域之外的方法。我们将来自两个正交词汇的概念结合起来,以生成一个更大、更具体的词汇。以哺乳动物心脏发育为例,因为它解决了挑战GO的两个问题;纳入特定生物体的解剖学术语,以及术语和关系的激增。来自正交词汇的概念组合提供了相关术语的强大表示,并为评估假设概念提供了机会。