Informatics Center, Federal University of Pernambuco (CIn/UFPE), Recife, Brazil.
Bioinformatics. 2011 Jul 1;27(13):i349-56. doi: 10.1093/bioinformatics/btr226.
Ontology-like domain knowledge is frequently published in a tabular format embedded in scientific publications. We explore the re-use of such tabular content in the process of building NTDO, an ontology of neglected tropical diseases (NTDs), where the representation of the interdependencies between hosts, pathogens and vectors plays a crucial role.
As a proof of concept we analyzed a tabular compilation of knowledge about pathogens, vectors and geographic locations involved in the transmission of NTDs. After a thorough ontological analysis of the domain of interest, we formulated a comprehensive design pattern, rooted in the biomedical domain upper level ontology BioTop. This pattern was implemented in a VBA script which takes cell contents of an Excel spreadsheet and transforms them into OWL-DL. After minor manual post-processing, the correctness and completeness of the ontology was tested using pre-formulated competence questions as description logics (DL) queries. The expected results could be reproduced by the ontology. The proposed approach is recommended for optimizing the acquisition of ontological domain knowledge from tabular representations.
Domain examples, source code and ontology are freely available on the web at http://www.cin.ufpe.br/~ntdo.
类似本体的领域知识经常以表格形式嵌入在科学出版物中发布。我们在构建 NTDO 的过程中探索了此类表格内容的重用,NTDO 是一个被忽视的热带病(NTD)本体,宿主、病原体和媒介之间的相互依存关系的表示在其中起着至关重要的作用。
作为概念验证,我们分析了一个关于 NTD 传播所涉及的病原体、媒介和地理位置的知识的表格汇编。在对感兴趣的领域进行彻底的本体分析之后,我们制定了一个全面的设计模式,该模式植根于生物医学领域的上层本体 BioTop。该模式在一个 VBA 脚本中实现,该脚本采用 Excel 电子表格的单元格内容,并将其转换为 OWL-DL。经过少量手动后处理,使用预先制定的能力问题作为描述逻辑(DL)查询来测试本体的正确性和完整性。本体可以重现预期的结果。建议采用这种方法来优化从表格表示中获取本体领域知识。
域示例、源代码和本体可在 http://www.cin.ufpe.br/~ntdo 上免费获得。