Cimino J J, Min H, Perl Y
Department of Medical Informatics, Columbia University, New York, NY, USA.
J Biomed Inform. 2003 Dec;36(6):450-61. doi: 10.1016/j.jbi.2003.11.001.
To develop and test a method for automatically detecting inconsistencies between the parent-child is-a relationships in the Metathesaurus and the ancestor-descendant relationships in the Semantic Network of the Unified Medical Language System (UMLS).
We exploited the fact that each Metathesaurus concept is assigned one or more semantic types from the UMLS Semantic Network and that the semantic types are arranged in a hierarchy. We compared the semantic types of each pair of parent and child concepts to determine if the types "explained" the Metathesaurus is-a relationships. We considered cases where the semantic type of the parent was neither the same as, nor an ancestor of, the semantic type of the child to be "unexplained." We applied this method to the January 2002 release of the UMLS and examined the unexplained cases we discovered to determine their causes.
We found that 17022 (24.3%) of the parent-child is-a relationships in the UMLS Metathesaurus could not be explained based on the semantic types of the concepts. Causes for these discrepancies included cases where the parent or child was missing a semantic type, cases where the semantic type of the child was too general or the semantic type of the parent was too specific, cases where the parent-child relationship was incorrect, and cases where an ancestor-descendant relationship should be added to the UMLS Semantic network. In many cases, the specific cause of the discrepancy cannot be resolved without authoritative judgment by the UMLS developers.
Our method successfully detects inconsistencies between the hierarchies of the UMLS Metathesaurus and Semantic Network. We believe that our method should be added to the set of tools that the UMLS developers use to maintain and audit the UMLS knowledge sources.
开发并测试一种自动检测元词表中父子“是一种”关系与统一医学语言系统(UMLS)语义网络中祖先-后代关系之间不一致性的方法。
我们利用了这样一个事实,即元词表中的每个概念都被赋予了UMLS语义网络中的一种或多种语义类型,并且这些语义类型按层次结构排列。我们比较了每对父子概念的语义类型,以确定这些类型是否“解释了”元词表中的“是一种”关系。我们将父概念的语义类型既与子概念的语义类型不同,也不是其子类型的情况视为“无法解释的”。我们将此方法应用于2002年1月发布的UMLS,并检查我们发现的无法解释的情况以确定其原因。
我们发现,UMLS元词表中17022(24.3%)的父子“是一种”关系无法根据概念的语义类型得到解释。这些差异的原因包括父概念或子概念缺少语义类型的情况、子概念的语义类型过于宽泛或父概念的语义类型过于具体的情况、父子关系不正确的情况,以及应在UMLS语义网络中添加祖先-后代关系的情况。在许多情况下,如果没有UMLS开发者的权威判断,差异的具体原因无法得到解决。
我们的方法成功检测到了UMLS元词表和语义网络层次结构之间的不一致性。我们认为,我们的方法应添加到UMLS开发者用于维护和审核UMLS知识源的工具集中。