Zhang Li, Hripcsak George, Perl Yehoshua, Halper Michael, Geller James
Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA.
Artif Intell Med. 2005 Jul;34(3):219-33. doi: 10.1016/j.artmed.2005.01.002.
A metaschema is an abstraction network of the UMLS's semantic network (SN) obtained from a connected partition of its collection of semantic types. A lexical metaschema was previously derived based on a lexical partition which partitioned the SN into semantic-type groups using identical word-usage among the names of semantic types and the definitions of their respective children. In this paper, a statistical analysis methodology is presented to evaluate the lexical metaschema based on a study involving a group of established UMLS experts.
In the study, each expert was asked to identify subject areas of the SN based on his or her understanding of the various semantic types. For this purpose, the expert scans the SN hierarchy top-down, identifying semantic types, which are important and different enough from their parent semantic types, as roots of their groups. From the response of each expert, an "expert metaschema" is constructed. The different experts' metaschemas can vary widely. So, additional metaschemas are obtained from aggregations of the experts' responses. Of special interest is the consensus metaschema which represents an aggregation of a simple majority of the experts' responses. Statistical analysis comparing the lexical metaschema with the experts' metaschemas and the consensus metaschema is presented.
The analysis results shows that 17 out of the 21 meta-semantic types in the lexical metaschema also appear in the consensus metaschema (about 81%). There are 107 semantic types (about 79%) covered by identical meta-semantic types and refinements. The results show the high similarity between the two metaschemas. Furthermore, the statistical analysis shows that the lexical metaschema did not grossly underperform compared to the experts.
Our study shows that the lexical metaschema provides a good approximation for a partition of meaningful subject areas in the SN, when compared to the consensus metaschema capturing the aggregation of a simple majority of the human experts' opinions.
元模式是从统一医学语言系统(UMLS)语义网络(SN)的语义类型集合的连通分区中获得的抽象网络。词汇元模式先前是基于词汇分区得出的,该分区使用语义类型名称及其各自子类型定义中的相同词用法,将SN划分为语义类型组。本文提出一种统计分析方法,以基于一项涉及一组既定UMLS专家的研究来评估词汇元模式。
在该研究中,要求每位专家根据其对各种语义类型的理解来确定SN的主题领域。为此,专家自上而下扫描SN层次结构,将那些重要且与其父语义类型有足够差异的语义类型识别为其组的根。根据每位专家的回答构建一个“专家元模式”。不同专家的元模式可能差异很大。因此,从专家回答的汇总中获得了额外的元模式。特别令人感兴趣的是共识元模式,它代表了专家回答的简单多数的汇总。展示了将词汇元模式与专家元模式和共识元模式进行比较的统计分析。
分析结果表明,词汇元模式中的21个元语义类型中有17个也出现在共识元模式中(约81%)。有107个语义类型(约79%)由相同的元语义类型及其细化所涵盖。结果表明这两种元模式之间具有高度相似性。此外,统计分析表明,与专家相比词汇元模式并没有明显表现不佳。
我们的研究表明,与捕捉人类专家简单多数意见汇总的共识元模式相比,词汇元模式为SN中有意义的主题领域分区提供了一个很好的近似。