He Zhe, Perl Yehoshua, Elhanan Gai, Chen Yan, Geller James, Bian Jiang
School of Information, Florida State University, Tallahassee, FL,
Department of Computer Science, New Jersey Institute of Tehnology, Newark, NJ,
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov;2017:1262-1269. doi: 10.1109/BIBM.2017.8217840. Epub 2017 Dec 18.
The Unified Medical Language System (UMLS) is an important terminological system. By the policy of its curators, each concept of the UMLS should be assigned the most specific Semantic Types (STs) in the UMLS Semantic Network (SN). Hence, the Semantic Types of most UMLS concepts are assigned at or near the bottom (leaves) of the UMLS Semantic Network. While most ST assignments are correct, some errors do occur. Therefore, Quality Assurance efforts of UMLS curators for ST assignments should concentrate on automatically detected sets of UMLS concepts with higher error rates than random sets. In this paper, we investigate the assignments of top-level semantic types in the UMLS semantic network to concepts, identify potential erroneous assignments, define four categories of errors, and thus provide assistance to curators of the UMLS to avoid these assignments errors. Human experts analyzed samples of concepts assigned 10 of the top-level semantic types and categorized the erroneous ST assignments into these four logical categories. Two thirds of the concepts assigned these 10 top-level semantic types are erroneous. Our results demonstrate that reviewing top-level semantic type assignments to concepts provides an effective way for UMLS quality assurance, comparing to reviewing a random selection of semantic type assignments.
统一医学语言系统(UMLS)是一个重要的术语系统。根据其管理者的政策,UMLS的每个概念都应在UMLS语义网络(SN)中被赋予最具体的语义类型(STs)。因此,大多数UMLS概念的语义类型被分配在UMLS语义网络的底部或接近底部(叶子节点)。虽然大多数语义类型的分配是正确的,但确实会出现一些错误。因此,UMLS管理者在语义类型分配方面的质量保证工作应集中在自动检测出的比随机集合错误率更高的UMLS概念集上。在本文中,我们研究了UMLS语义网络中顶级语义类型到概念的分配情况,识别潜在的错误分配,定义了四类错误,从而为UMLS的管理者提供帮助以避免这些分配错误。人类专家分析了被赋予10种顶级语义类型的概念样本,并将错误的语义类型分配归类到这四个逻辑类别中。被赋予这10种顶级语义类型的概念中有三分之二是错误的。我们的结果表明,与审查随机选择的语义类型分配相比,审查概念的顶级语义类型分配为UMLS质量保证提供了一种有效的方法。