Paul Razan, Groza Tudor, Hunter Jane, Zankl Andreas
School of ITEE, The University of Queensland, St, Lucia, Queensland 4072, Australia.
J Biomed Semantics. 2014 Feb 5;5(1):8. doi: 10.1186/2041-1480-5-8.
Lately, ontologies have become a fundamental building block in the process of formalising and storing complex biomedical information. With the currently existing wealth of formalised knowledge, the ability to discover implicit relationships between different ontological concepts becomes particularly important. One of the most widely used methods to achieve this is association rule mining. However, while previous research exists on applying traditional association rule mining on ontologies, no approach has, to date, exploited the advantages brought by using the structure of these ontologies in computing rule interestingness measures.
We introduce a method that combines concept similarity metrics, formulated using the intrinsic structure of a given ontology, with traditional interestingness measures to compute semantic interestingness measures in the process of association rule mining. We apply the method in our domain of interest - bone dysplasias - using the core ontologies characterising it and an annotated dataset of patient clinical summaries, with the goal of discovering implicit relationships between clinical features and disorders. Experimental results show that, using the above mentioned dataset and a voting strategy classification evaluation, the best scoring traditional interestingness measure achieves an accuracy of 57.33%, while the best scoring semantic interestingness measure achieves an accuracy of 64.38%, both at the recall cut-off point 5.
Semantic interestingness measures outperform the traditional ones, and hence show that they are able to exploit the semantic similarities inherently present between ontological concepts. Nevertheless, this is dependent on the domain, and implicitly, on the semantic similarity metric chosen to model it.
近来,本体已成为形式化和存储复杂生物医学信息过程中的基本构建块。鉴于当前已有的大量形式化知识,发现不同本体概念之间隐含关系的能力变得尤为重要。实现这一目标最广泛使用的方法之一是关联规则挖掘。然而,虽然之前有关于在本体上应用传统关联规则挖掘的研究,但迄今为止,尚无方法利用这些本体的结构在计算规则兴趣度度量时所带来的优势。
我们引入了一种方法,该方法将利用给定本体的内在结构制定的概念相似性度量与传统兴趣度度量相结合,以便在关联规则挖掘过程中计算语义兴趣度度量。我们将该方法应用于我们感兴趣的领域——骨发育异常,使用表征该领域的核心本体和患者临床摘要的注释数据集,目的是发现临床特征与疾病之间的隐含关系。实验结果表明,使用上述数据集和投票策略分类评估,在召回率截止点为5时,得分最高的传统兴趣度度量的准确率为57.33%,而得分最高的语义兴趣度度量的准确率为64.38%。
语义兴趣度度量优于传统度量,因此表明它们能够利用本体概念之间固有的语义相似性。然而,这取决于领域,并且隐含地取决于为对其进行建模而选择的语义相似性度量。