Hoehndorf Robert, Ngonga Ngomo Axel-Cyrille, Pyysalo Sampo, Ohta Tomoko, Oellrich Anika, Rebholz-Schuhmann Dietrich
European Bioinformatics Institute, Hinxton, Cambridge, UK.
J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S1. doi: 10.1186/2041-1480-2-S5-S1.
Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences.
We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications.
Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.
带注释的参考语料库在生物医学信息提取中发挥着重要作用。由于自然语言固有的模糊性,使用形式本体对这些参考语料库中的自然语言文本进行语义注释具有挑战性。为语义注释提供形式定义和公理,为确保一致性提供了手段,也有助于开发可验证的注释指南。一致的语义注释有助于通过演绎推理自动发现新信息。
我们对最近GENIA语料库注释中使用的关系进行了形式化表征。为此,我们既根据领域内关系的期望属性选择现有的公理系统,又为几个关系开发了新的公理。为了将这种关系本体应用于文本语料库的语义注释,我们实现了两种本体设计模式。此外,我们提供了一个软件应用程序,通过结合关系本体和设计模式,将带注释的GENIA摘要转换为OWL本体。结果,GENIA摘要可以作为OWL本体使用,适用于自动验证、演绎推理和其他基于知识的应用。