Mikroyannidi Eleni, Stevens Robert, Iannone Luigi, Rector Alan
School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL UK.
J Biomed Semantics. 2012 Dec 17;3(1):8. doi: 10.1186/2041-1480-3-8.
In this paper we demonstrate the usage of RIO; a framework for detecting syntactic regularities using cluster analysis of the entities in the signature of an ontology. Quality assurance in ontologies is vital for their use in real applications, as well as a complex and difficult task. It is also important to have such methods and tools when the ontology lacks documentation and the user cannot consult the ontology developers to understand its construction. One aspect of quality assurance is checking how well an ontology complies with established 'coding standards'; is the ontology regular in how descriptions of different types of entities are axiomatised? Is there a similar way to describe them and are there any corner cases that are not covered by a pattern? Detection of regularities and irregularities in axiom patterns should provide ontology authors and quality inspectors with a level of abstraction such that compliance to coding standards can be automated. However, there is a lack of such reverse ontology engineering methods and tools.
RIO framework allows regularities to be detected in an OWL ontology, i.e. repetitive structures in the axioms of an ontology. We describe the use of standard machine learning approaches to make clusters of similar entities and generalise over their axioms to find regularities. This abstraction allows matches to, and deviations from, an ontology's patterns to be shown. We demonstrate its usage with the inspection of three modules from SNOMED-CT, a large medical terminology, that cover "Present" and "Absent" findings, as well as "Chronic" and "Acute" findings. The module sizes are 5 065, 20 688 and 19 812 asserted axioms. They are analysed in terms of their types and number of regularities and irregularities in the asserted axioms of the ontology. The analysis showed that some modules of the terminology, which were expected to instantiate a pattern described in the SNOMED-CT technical guide, were found to have a high number of regularity deviations. A subset of these were categorised as "design defects" by verifying them with past work on the quality assurance of SNOMED-CT. These were mainly incomplete descriptions. In the worst case, the expected patterns described in the technical guide were followed by only 5% of the axioms in the module.
It is possible to automatically detect regularities and then inspect irregularities in an ontology. We argue that RIO is a tool to find and report such matches and mismatches, for evaluations by the domain experts. We have demonstrated that standard clustering techniques from machine learning can offer a tool in the drive for quality assurance in ontologies.
http://riotool.sourceforge.net/
http://eleni.mikroyannidi@manchester.ac.uk, http://robert.stevens@manchehster.ac.uk.
在本文中,我们展示了RIO的用法;RIO是一个通过对本体签名中的实体进行聚类分析来检测句法规则的框架。本体中的质量保证对于其在实际应用中的使用至关重要,同时也是一项复杂且困难的任务。当本体缺乏文档且用户无法咨询本体开发者以了解其构建方式时,拥有此类方法和工具也很重要。质量保证的一个方面是检查本体与既定“编码标准”的符合程度;本体在对不同类型实体的描述进行公理形式化时是否规则?是否有一种类似的方式来描述它们,是否存在任何模式未涵盖的特殊情况?检测公理模式中的规则性和不规则性应为本体作者和质量检查人员提供一定程度的抽象,以便能够自动检查是否符合编码标准。然而,目前缺乏此类反向本体工程方法和工具。
RIO框架允许在OWL本体中检测规则性,即本体公理中的重复结构。我们描述了使用标准机器学习方法对相似实体进行聚类,并对其公理进行归纳以发现规则性。这种抽象允许展示与本体模式的匹配情况以及偏差。我们通过检查来自大型医学术语集SNOMED-CT的三个模块来演示其用法,这三个模块涵盖了“存在”和“不存在”的发现,以及“慢性”和“急性”的发现。模块大小分别为5065条、20688条和19812条断言公理。我们根据本体断言公理中的类型以及规则性和不规则性的数量对它们进行了分析。分析表明,该术语集的一些模块本应实例化SNOMED-CT技术指南中描述的一种模式,但却发现存在大量规则性偏差。通过将其中一部分与过去关于SNOMED-CT质量保证的工作进行验证,将其归类为“设计缺陷”。这些主要是不完整的描述。在最坏的情况下,技术指南中描述的预期模式在模块中仅被5%的公理遵循。
可以自动检测本体中的规则性,然后检查其中的不规则性。我们认为RIO是一种用于查找和报告此类匹配和不匹配情况的工具,以供领域专家进行评估。我们已经证明,机器学习中的标准聚类技术可以为本体质量保证工作提供一种工具。
http://riotool.sourceforge.net/
http://eleni.mikroyannidi@manchester.ac.uk,http://robert.stevens@manchehster.ac.uk。