Zhang Songmao, Bodenreider Olivier
Institute of Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, P. R. China,
Int J Semant Web Inf Syst. 2007;3(2):1-26. doi: 10.4018/jswis.2007040101.
An ontology is a formal representation of a domain modeling the entities in the domain and their relations. When a domain is represented by multiple ontologies, there is need for creating mappings among these ontologies in order to facilitate the integration of data annotated with these ontologies and reasoning across ontologies. The objective of this paper is to recapitulate our experience in aligning large anatomical ontologies and to reflect on some of the issues and challenges encountered along the way. The four anatomical ontologies under investigation are the Foundational Model of Anatomy, GALEN, the Adult Mouse Anatomical Dictionary and the NCI Thesaurus. Their underlying representation formalisms are all different. Our approach to aligning concepts (directly) is automatic, rule-based, and operates at the schema level, generating mostly point-to-point mappings. It uses a combination of domain-specific lexical techniques and structural and semantic techniques (to validate the mappings suggested lexically). It also takes advantage of domain-specific knowledge (lexical knowledge from external resources such as the Unified Medical Language System, as well as knowledge augmentation and inference techniques). In addition to point-to-point mapping of concepts, we present the alignment of relationships and the mapping of concepts group-to-group. We have also successfully tested an indirect alignment through a domain-specific reference ontology. We present an evaluation of our techniques, both against a gold standard established manually and against a generic schema matching system. The advantages and limitations of our approach are analyzed and discussed throughout the paper.
本体是对一个领域的形式化表示,它对该领域中的实体及其关系进行建模。当一个领域由多个本体表示时,需要在这些本体之间创建映射,以便促进用这些本体注释的数据的集成以及跨本体的推理。本文的目的是总结我们在对齐大型解剖学本体方面的经验,并思考在此过程中遇到的一些问题和挑战。所研究的四个解剖学本体是解剖学基础模型、盖伦模型、成年小鼠解剖学词典和美国国立癌症研究所叙词表。它们的底层表示形式各不相同。我们(直接)对齐概念的方法是自动的、基于规则的,并且在模式级别上运行,主要生成点对点映射。它结合了特定领域的词汇技术以及结构和语义技术(以验证词汇上建议的映射)。它还利用特定领域的知识(来自外部资源如统一医学语言系统的词汇知识,以及知识增强和推理技术)。除了概念的点对点映射外,我们还展示了关系的对齐以及概念组到组的映射。我们还通过一个特定领域的参考本体成功测试了间接对齐。我们针对手动建立的黄金标准以及通用模式匹配系统对我们的技术进行了评估。本文自始至终分析并讨论了我们方法的优点和局限性。