Suppr超能文献

从词汇规律到公理模式,保障生物医学术语和本体的质量。

From lexical regularities to axiomatic patterns for the quality assurance of biomedical terminologies and ontologies.

机构信息

Department of Medical Informatics, Amsterdam Public Health research institute, Academic Medical Center, University of Amsterdam, The Netherlands.

Departamento de Informática y Sistemas, Universidad de Murcia, IMIB-Arrixaca, Murcia, Spain; Center of Operations Research (CIO), University Miguel Hernandez of Elche (UMH), Spain.

出版信息

J Biomed Inform. 2018 Aug;84:59-74. doi: 10.1016/j.jbi.2018.06.008. Epub 2018 Jun 14.

Abstract

Ontologies and terminologies have been identified as key resources for the achievement of semantic interoperability in biomedical domains. The development of ontologies is performed as a joint work by domain experts and knowledge engineers. The maintenance and auditing of these resources is also the responsibility of such experts, and this is usually a time-consuming, mostly manual task. Manual auditing is impractical and ineffective for most biomedical ontologies, especially for larger ones. An example is SNOMED CT, a key resource in many countries for codifying medical information. SNOMED CT contains more than 300000 concepts. Consequently its auditing requires the support of automatic methods. Many biomedical ontologies contain natural language content for humans and logical axioms for machines. The 'lexically suggest, logically define' principle means that there should be a relation between what is expressed in natural language and as logical axioms, and that such a relation should be useful for auditing and quality assurance. Besides, the meaning of this principle is that the natural language content for humans could be used to generate the logical axioms for the machines. In this work, we propose a method that combines lexical analysis and clustering techniques to (1) identify regularities in the natural language content of ontologies; (2) cluster, by similarity, labels exhibiting a regularity; (3) extract relevant information from those clusters; and (4) propose logical axioms for each cluster with the support of axiom templates. These logical axioms can then be evaluated with the existing axioms in the ontology to check their correctness and completeness, which are two fundamental objectives in auditing and quality assurance. In this paper, we describe the application of the method to two SNOMED CT modules, a 'congenital' module, obtained using concepts exhibiting the attribute Occurrence - Congenital, and a 'chronic' module, using concepts exhibiting the attribute Clinical course - Chronic. We obtained a precision and a recall of respectively 75% and 28% for the 'congenital' module, and 64% and 40% for the 'chronic' one. We consider these results to be promising, so our method can contribute to the support of content editors by using automatic methods for assuring the quality of biomedical ontologies and terminologies.

摘要

本体和术语已被确定为实现生物医学领域语义互操作性的关键资源。本体的开发是由领域专家和知识工程师共同完成的。这些资源的维护和审核也是这些专家的责任,这通常是一项耗时且主要是手动的任务。对于大多数生物医学本体,特别是对于较大的本体,手动审核是不切实际且无效的。例如,SNOMED CT 是许多国家用于对医疗信息进行编码的关键资源。SNOMED CT 包含超过 300000 个概念。因此,它的审核需要自动方法的支持。许多生物医学本体包含人类使用的自然语言内容和机器使用的逻辑公理。“词汇提示,逻辑定义”原则意味着自然语言表达的内容和逻辑公理之间应该存在关系,并且这种关系应该有助于审核和质量保证。此外,该原则的含义是,人类使用的自然语言内容可用于为机器生成逻辑公理。在这项工作中,我们提出了一种结合词汇分析和聚类技术的方法,用于(1)识别本体的自然语言内容中的规律;(2)通过相似性对具有相似规律的标签进行聚类;(3)从这些聚类中提取相关信息;(4)在公理模板的支持下为每个聚类提出逻辑公理。然后可以使用本体中的现有公理来评估这些逻辑公理,以检查它们的正确性和完整性,这是审核和质量保证的两个基本目标。在本文中,我们描述了该方法在两个 SNOMED CT 模块中的应用,一个是“先天性”模块,使用具有属性“Occurrence - Congenital”的概念获得,另一个是“慢性”模块,使用具有属性“Clinical course - Chronic”的概念获得。我们为“先天性”模块获得了分别为 75%和 28%的精度和召回率,为“慢性”模块获得了分别为 64%和 40%的精度和召回率。我们认为这些结果很有希望,因此我们的方法可以通过使用自动方法来支持内容编辑,从而为生物医学本体和术语的质量保证做出贡献。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验