Cimino J J
Department of Medical Informatics, Columbia University College of Physicians and Surgeons, New York, NY, USA.
J Am Med Inform Assoc. 1998 Jan-Feb;5(1):41-51. doi: 10.1136/jamia.1998.0050041.
The National Library of Medicine's (NLM) Unified Medical Language System (UMLS) includes a Metathesaurus (Meta), which is a compilation of medical terms drawn from over 30 controlled vocabularies, and a Semantic Net, which contains the semantic types used to categorize Meta concepts and the semantic relations to connect them. Meta has been constructed through lexical matching techniques and human review. The purpose of this study was to audit the Meta using semantic techniques to identify possible inconsistencies.
Five different techniques were applied: (1) detection of ambiguity in Meta concepts with two or more semantic types, (2) detection of interchangeable keyword synonyms, (3) detection of redundant pairs of Meta concepts (using lexical matching combined with keyword synonyms), (4) detection of inconsistent parent-child relationships in Meta (based on the semantic type information), and (5) discovery of pairs of semantic types for which relations could be added to the Semantic Net, based on "other" relationships between Meta concepts.
Of 57,592 concepts with multiple semantic types, 1817 (3.2%) were judged to be ambiguous. Keyword analysis showed 7121 pairs of interchangeable words. Using the keyword pairs, 5031 pairs of potentially redundant concepts were suggested, of which 3274 (65.1%) were judged to actually be redundant. Review of the 100,586 parent-child relationships revealed 544 (0.54%) that were incorrect. Review of the 219,664 "Other" relationships suggested 1299 places in the Semantic Net where relations between pairs of semantic types could be added.
Semantic techniques, alone or in combination, can be used to audit the UMLS to detect inconsistencies that are not detectable through lexical techniques alone. Use of these methods to augment the UMLS maintenance process will lead to improvement in the UMLS.
美国国立医学图书馆(NLM)的统一医学语言系统(UMLS)包括一个元词表(Meta),它是从30多种受控词汇表中提取的医学术语汇编,以及一个语义网络,其中包含用于对Meta概念进行分类的语义类型和连接它们的语义关系。Meta是通过词汇匹配技术和人工审核构建的。本研究的目的是使用语义技术审核Meta,以识别可能存在的不一致性。
应用了五种不同的技术:(1)检测具有两种或更多种语义类型的Meta概念中的歧义;(2)检测可互换的关键词同义词;(3)检测Meta概念的冗余对(使用词汇匹配并结合关键词同义词);(4)检测Meta中不一致的父子关系(基于语义类型信息);(5)基于Meta概念之间的“其他”关系,发现可以在语义网络中添加关系的语义类型对。
在57592个具有多种语义类型的概念中,1817个(3.2%)被判定为有歧义。关键词分析显示有7121对可互换的词。使用这些关键词对,提出了5031对潜在冗余的概念,其中3274对(65.1%)被判定实际为冗余。对100586个父子关系的审查发现544个(0.54%)是不正确的。对219664个“其他”关系的审查表明,在语义网络中有1299个地方可以添加语义类型对之间的关系。
语义技术单独或结合使用,可用于审核UMLS,以检测仅通过词汇技术无法检测到的不一致性。使用这些方法来加强UMLS维护过程将使UMLS得到改进。