Jiménez-Ruiz Ernesto, Grau Bernardo Cuenca, Horrocks Ian, Berlanga Rafael
Departamento de Lenguajes y Sistemas Informáticos, Universitat Jaume I, Campus de Riu Sec, Castellón, Spain.
J Biomed Semantics. 2011 Mar 7;2 Suppl 1(Suppl 1):S2. doi: 10.1186/2041-1480-2-S1-S2.
The UMLS Metathesaurus (UMLS-Meta) is currently the most comprehensive effort for integrating independently-developed medical thesauri and ontologies. UMLS-Meta is being used in many applications, including PubMed and ClinicalTrials.gov. The integration of new sources combines automatic techniques, expert assessment, and auditing protocols. The automatic techniques currently in use, however, are mostly based on lexical algorithms and often disregard the semantics of the sources being integrated.
In this paper, we argue that UMLS-Meta's current design and auditing methodologies could be significantly enhanced by taking into account the logic-based semantics of the ontology sources. We provide empirical evidence suggesting that UMLS-Meta in its 2009AA version contains a significant number of errors; these errors become immediately apparent if the rich semantics of the ontology sources is taken into account, manifesting themselves as unintended logical consequences that follow from the ontology sources together with the information in UMLS-Meta. We then propose general principles and specific logic-based techniques to effectively detect and repair such errors.
Our results suggest that the methodologies employed in the design of UMLS-Meta are not only very costly in terms of human effort, but also error-prone. The techniques presented here can be useful for both reducing human effort in the design and maintenance of UMLS-Meta and improving the quality of its contents.
统一医学语言系统元词表(UMLS-Meta)目前是整合独立开发的医学词库和本体论的最全面成果。UMLS-Meta正被用于许多应用程序,包括PubMed和ClinicalTrials.gov。新来源的整合结合了自动技术、专家评估和审核协议。然而,目前使用的自动技术大多基于词汇算法,常常忽略被整合来源的语义。
在本文中,我们认为,通过考虑本体论来源基于逻辑的语义,UMLS-Meta当前的设计和审核方法可以得到显著改进。我们提供的实证证据表明,2009AA版本的UMLS-Meta包含大量错误;如果考虑到本体论来源丰富的语义,这些错误会立即显现出来,表现为本体论来源与UMLS-Meta中的信息一起产生的意外逻辑结果。然后,我们提出了通用原则和基于特定逻辑的技术,以有效地检测和修复此类错误。
我们的结果表明,UMLS-Meta设计中采用的方法不仅在人力方面成本很高,而且容易出错。这里提出的技术对于减少UMLS-Meta设计和维护中的人力以及提高其内容质量可能会很有用。