Oliveira Daniela, Pesquita Catia
Insight Centre for Data Analytics, NUI Galway, Galway Business Park, Dangan, Galway, H91 AEX4, Ireland.
LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal.
J Biomed Semantics. 2018 Jan 9;9(1):1. doi: 10.1186/s13326-017-0171-8.
Ontologies are commonly used to annotate and help process life sciences data. Although their original goal is to facilitate integration and interoperability among heterogeneous data sources, when these sources are annotated with distinct ontologies, bridging this gap can be challenging. In the last decade, ontology matching systems have been evolving and are now capable of producing high-quality mappings for life sciences ontologies, usually limited to the equivalence between two ontologies. However, life sciences research is becoming increasingly transdisciplinary and integrative, fostering the need to develop matching strategies that are able to handle multiple ontologies and more complex relations between their concepts.
We have developed ontology matching algorithms that are able to find compound mappings between multiple biomedical ontologies, in the form of ternary mappings, finding for instance that "aortic valve stenosis"(HP:0001650) is equivalent to the intersection between "aortic valve"(FMA:7236) and "constricted" (PATO:0001847). The algorithms take advantage of search space filtering based on partial mappings between ontology pairs, to be able to handle the increased computational demands. The evaluation of the algorithms has shown that they are able to produce meaningful results, with precision in the range of 60-92% for new mappings. The algorithms were also applied to the potential extension of logical definitions of the OBO and the matching of several plant-related ontologies.
This work is a first step towards finding more complex relations between multiple ontologies. The evaluation shows that the results produced are significant and that the algorithms could satisfy specific integration needs.
本体通常用于注释和帮助处理生命科学数据。尽管其最初目标是促进异构数据源之间的集成和互操作性,但当这些数据源用不同的本体进行注释时,弥合这一差距可能具有挑战性。在过去十年中,本体匹配系统不断发展,现在能够为生命科学本体生成高质量的映射,通常限于两个本体之间的等价关系。然而,生命科学研究正变得越来越跨学科和综合,这就需要开发能够处理多个本体及其概念之间更复杂关系的匹配策略。
我们开发了本体匹配算法,能够以三元映射的形式找到多个生物医学本体之间的复合映射,例如发现“主动脉瓣狭窄”(HP:0001650)等同于“主动脉瓣”(FMA:7236)和“狭窄的”(PATO:0001847)的交集。这些算法利用基于本体对之间部分映射的搜索空间过滤,以能够处理增加的计算需求。算法评估表明,它们能够产生有意义的结果,新映射的精度在60%-92%范围内。这些算法还应用于OBO逻辑定义的潜在扩展以及几个植物相关本体的匹配。
这项工作是朝着找到多个本体之间更复杂关系迈出的第一步。评估表明,产生的结果是显著的,并且算法可以满足特定的集成需求。