Faria Daniel, Pesquita Catia, Mott Isabela, Martins Catarina, Couto Francisco M, Cruz Isabel F
Instituto Gulbenkian de Ciência, R Quinta Grande 6, Oeiras, Portugal.
LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisboa, Portugal.
J Biomed Semantics. 2018 Jan 15;9(1):4. doi: 10.1186/s13326-017-0170-9.
Biomedical ontologies pose several challenges to ontology matching due both to the complexity of the biomedical domain and to the characteristics of the ontologies themselves. The biomedical tracks in the Ontology Matching Evaluation Initiative (OAEI) have spurred the development of matching systems able to tackle these challenges, and benchmarked their general performance. In this study, we dissect the strategies employed by matching systems to tackle the challenges of matching biomedical ontologies and gauge the impact of the challenges themselves on matching performance, using the AgreementMakerLight (AML) system as the platform for this study.
We demonstrate that the linear complexity of the hash-based searching strategy implemented by most state-of-the-art ontology matching systems is essential for matching large biomedical ontologies efficiently. We show that accounting for all lexical annotations (e.g., labels and synonyms) in biomedical ontologies leads to a substantial improvement in F-measure over using only the primary name, and that accounting for the reliability of different types of annotations generally also leads to a marked improvement. Finally, we show that cross-references are a reliable source of information and that, when using biomedical ontologies as background knowledge, it is generally more reliable to use them as mediators than to perform lexical expansion.
We anticipate that translating traditional matching algorithms to the hash-based searching paradigm will be a critical direction for the future development of the field. Improving the evaluation carried out in the biomedical tracks of the OAEI will also be important, as without proper reference alignments there is only so much that can be ascertained about matching systems or strategies. Nevertheless, it is clear that, to tackle the various challenges posed by biomedical ontologies, ontology matching systems must be able to efficiently combine multiple strategies into a mature matching approach.
由于生物医学领域的复杂性以及本体自身的特点,生物医学本体在本体匹配方面面临若干挑战。本体匹配评估倡议(OAEI)中的生物医学赛道推动了能够应对这些挑战的匹配系统的开发,并对其总体性能进行了基准测试。在本研究中,我们以AgreementMakerLight(AML)系统为研究平台,剖析匹配系统用于应对生物医学本体匹配挑战所采用的策略,并评估这些挑战本身对匹配性能的影响。
我们证明,大多数最先进的本体匹配系统所采用的基于哈希搜索策略的线性复杂度对于高效匹配大型生物医学本体至关重要。我们表明,考虑生物医学本体中的所有词汇注释(例如标签和同义词)相较于仅使用主名称,能显著提高F值,并且考虑不同类型注释的可靠性通常也会带来显著提升。最后,我们表明交叉引用是可靠的信息来源,并且当将生物医学本体用作背景知识时,将其用作中介通常比进行词汇扩展更可靠。
我们预计将传统匹配算法转换为基于哈希的搜索范式将是该领域未来发展的关键方向。改进OAEI生物医学赛道中的评估也很重要,因为没有适当的参考对齐,关于匹配系统或策略能确定的内容就很有限。然而,很明显,为了应对生物医学本体带来的各种挑战,本体匹配系统必须能够将多种策略有效地整合到一种成熟的匹配方法中。