He Zhe, Keloth Vipina Kuttichi, Chen Yan, Geller James
School of Information, Florida State University Tallahassee, Florida USA
Department of Computer Science, New Jersey Institute of Technology, Newark, NJ USA,
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2018 Dec;2018:1641-1648. doi: 10.1109/BIBM.2018.8621564. Epub 2019 Jan 24.
Maintenance of biomedical ontologies is difficult. We have previously developed a topological-pattern-based method to deal with the problem of identifying concepts in a reference ontology that could be of interest for insertion into a target ontology. Assuming that both ontologies are parts of the Unified Medical Language System (UMLS), the method suggests approximate locations where the target ontology could be extended with new concepts from the reference ontology. However, the final decision about each concept has to be made by a human expert. In this paper, we describe the universe of cross-ontology topological patterns in quantitative terms. We then present a theoretical analysis of the number of potential placements of reference concepts in a path in a target ontology, allowing for new cross-ontology synonyms. This provides a rough estimate of what expert resources need to be allocated for the task. One insight in previous work on this topic was the large percentage of cases where importing concepts was impossible, due to a configuration called "alternative classification." In this paper, we confirm this observation. Our target ontology is the National Cancer Institute thesaurus (NCIt). However, the methods can be applied to other pairs of ontologies with hierarchical relationships from the UMLS.
生物医学本体的维护颇具难度。我们之前开发了一种基于拓扑模式的方法,用于处理在参考本体中识别可能有兴趣插入到目标本体中的概念这一问题。假设两个本体都是统一医学语言系统(UMLS)的一部分,该方法会指出目标本体可以从参考本体中引入新概念进行扩展的大致位置。然而,关于每个概念的最终决策必须由人类专家做出。在本文中,我们从定量角度描述了跨本体拓扑模式的全域。然后,我们对参考概念在目标本体路径中的潜在放置数量进行了理论分析,同时考虑了新的跨本体同义词。这为该任务所需分配的专家资源提供了一个粗略估计。此前关于该主题的一项研究发现,由于一种名为“替代分类”的配置,在很大比例的情况下无法导入概念。在本文中,我们证实了这一观察结果。我们的目标本体是美国国立癌症研究所叙词表(NCIt)。不过,这些方法可应用于来自UMLS且具有层次关系的其他本体对。