Mougin Fleur, Bodenreider Olivier, Burgun Anita
LESIM, INSERM U593, ISPED, University of Bordeaux 2, France.
J Biomed Inform. 2009 Jun;42(3):440-51. doi: 10.1016/j.jbi.2009.03.008. Epub 2009 Mar 18.
Polysemy is a frequent issue in biomedical terminologies. In the Unified Medical Language System (UMLS), polysemous terms are either represented as several independent concepts, or clustered into a single, multiply-categorized concept. The objective of this study is to analyze polysemous concepts in the UMLS through their categorization and hierarchical relations for auditing purposes.
We used the association of a concept with multiple Semantic Groups (SGs) as a surrogate for polysemy. We first extracted multi-SG (MSG) concepts from the UMLS Metathesaurus and characterized them in terms of the combinations of SGs with which they are associated. We then clustered MSG concepts in order to identify major types of polysemy. We also analyzed the inheritance of SGs in MSG concepts. Finally, we manually reviewed the categorization of the MSG concepts for auditing purposes.
The 1208 MSG concepts in the Metathesaurus are associated with 30 distinct pairs of SGs. We created 75 semantically homogeneous clusters of MSG concepts, and 276 MSG concepts could not be clustered for lack of hierarchical relations. The clusters were characterized by the most frequent pairs of semantic types of their constituent MSG concepts. MSG concepts exhibit limited semantic compatibility with their parent and child concepts. A large majority of MSG concepts (92%) are adequately categorized. Examples of miscategorized concepts are presented.
This work is a systematic analysis and manual review of all concepts categorized by multiple SGs in the UMLS. The correctly-categorized MSG concepts do reflect polysemy in the UMLS Metathesaurus. The analysis of inheritance of SGs proved useful for auditing concept categorization in the UMLS.
多义性是生物医学术语中常见的问题。在统一医学语言系统(UMLS)中,多义词要么表示为几个独立的概念,要么聚类为一个单一的、多重分类的概念。本研究的目的是通过多义词的分类和层次关系分析UMLS中的多义概念,以进行审核。
我们将一个概念与多个语义组(SGs)的关联用作多义性的替代指标。我们首先从UMLS元词表中提取多语义组(MSG)概念,并根据与之相关联的SGs组合对其进行特征描述。然后我们对MSG概念进行聚类,以识别多义性的主要类型。我们还分析了MSG概念中SGs的继承情况。最后,我们手动审核MSG概念的分类以进行审核。
元词表中的1208个MSG概念与30对不同的SGs相关联。我们创建了75个语义同质的MSG概念簇,276个MSG概念因缺乏层次关系而无法聚类。这些簇由其组成MSG概念最常见的语义类型对来表征。MSG概念与其父概念和子概念的语义兼容性有限。绝大多数MSG概念(92%)分类恰当。文中给出了分类错误概念的示例。
这项工作是对UMLS中由多个SGs分类的所有概念进行系统分析和人工审核。正确分类的MSG概念确实反映了UMLS元词表中的多义性。对SGs继承情况的分析被证明有助于审核UMLS中的概念分类。