Magka Despoina, Krötzsch Markus, Horrocks Ian
Department of Computer Science, University of Oxford, Oxford, UK.
Department of Computer Science, Technical University of Dresden, Dresden, Germany.
J Biomed Semantics. 2014 Apr 15;5:17. doi: 10.1186/2041-1480-5-17. eCollection 2014.
A variety of key activities within life sciences research involves integrating and intelligently managing large amounts of biochemical information. Semantic technologies provide an intuitive way to organise and sift through these rapidly growing datasets via the design and maintenance of ontology-supported knowledge bases. To this end, OWL-a W3C standard declarative language- has been extensively used in the deployment of biochemical ontologies that can be conveniently organised using the classification facilities of OWL-based tools. One of the most established ontologies for the chemical domain is ChEBI, an open-access dictionary of molecular entities that supplies high quality annotation and taxonomical information for biologically relevant compounds. However, ChEBI is being manually expanded which hinders its potential to grow due to the limited availability of human resources.
In this work, we describe a prototype that performs automatic classification of chemical compounds. The software we present implements a sound and complete reasoning procedure of a formalism that extends datalog and builds upon an off-the-shelf deductive database system. We capture a wide range of chemical classes that are not expressible with OWL-based formalisms such as cyclic molecules, saturated molecules and alkanes. Furthermore, we describe a surface 'less-logician-like' syntax that allows application experts to create ontological descriptions of complex biochemical objects without prior knowledge of logic. In terms of performance, a noticeable improvement is observed in comparison with previous approaches. Our evaluation has discovered subsumptions that are missing from the manually curated ChEBI ontology as well as discrepancies with respect to existing subclass relations. We illustrate thus the potential of an ontology language suitable for the life sciences domain that exhibits a favourable balance between expressive power and practical feasibility.
Our proposed methodology can form the basis of an ontology-mediated application to assist biocurators in the production of complete and error-free taxonomies. Moreover, such a tool could contribute to a more rapid development of the ChEBI ontology and to the efforts of the ChEBI team to make annotated chemical datasets available to the public. From a modelling point of view, our approach could stimulate the adoption of a different and expressive reasoning paradigm based on rules for which state-of-the-art and highly optimised reasoners are available; it could thus pave the way for the representation of a broader spectrum of life sciences and biomedical knowledge.
生命科学研究中的各种关键活动都涉及整合和智能管理大量的生化信息。语义技术提供了一种直观的方式,通过设计和维护本体支持的知识库来组织和筛选这些快速增长的数据集。为此,OWL(一种W3C标准声明性语言)已被广泛用于部署生化本体,这些本体可以使用基于OWL的工具的分类功能方便地进行组织。化学领域最成熟的本体之一是ChEBI,它是一个分子实体的开放获取词典,为生物相关化合物提供高质量的注释和分类信息。然而,ChEBI是通过人工扩展的,由于人力资源有限,这阻碍了它的增长潜力。
在这项工作中,我们描述了一个对化合物进行自动分类的原型。我们展示的软件实现了一种形式主义的合理且完整的推理过程,该形式主义扩展了数据日志,并基于现成的演绎数据库系统构建。我们捕获了一系列基于OWL的形式主义无法表达的化学类别,如环状分子、饱和分子和烷烃。此外,我们描述了一种表面上“不太像逻辑学家”的语法,使应用专家无需先验逻辑知识就能创建复杂生化对象的本体描述。在性能方面,与之前的方法相比有显著提升。我们的评估发现了人工整理的ChEBI本体中缺失的包含关系以及与现有子类关系的差异。因此,我们展示了一种适用于生命科学领域的本体语言的潜力,这种语言在表达能力和实际可行性之间展现出良好的平衡。
我们提出的方法可以构成本体介导应用的基础,以协助生物编目人员生成完整且无错误的分类法。此外,这样的工具可以促进ChEBI本体的更快发展,并有助于ChEBI团队努力向公众提供带注释的化学数据集。从建模的角度来看,我们的方法可以促使采用一种基于规则的不同且富有表现力的推理范式,针对这种范式有现成的、高度优化的推理器;因此,它可以为更广泛的生命科学和生物医学知识的表示铺平道路。