He Zhe, Geller James, Chen Yan
Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA.
Artif Intell Med. 2015 May;64(1):29-40. doi: 10.1016/j.artmed.2015.03.002. Epub 2015 Apr 2.
Medical terminologies vary in the amount of concept information (the "density") represented, even in the same sub-domains. This causes problems in terminology mapping, semantic harmonization and terminology integration. Moreover, complex clinical scenarios need to be encoded by a medical terminology with comprehensive content. SNOMED Clinical Terms (SNOMED CT), a leading clinical terminology, was reported to lack concepts and synonyms, problems that cannot be fully alleviated by using post-coordination. Therefore, a scalable solution is needed to enrich the conceptual content of SNOMED CT. We are developing a structure-based, algorithmic method to identify potential concepts for enriching the conceptual content of SNOMED CT and to support semantic harmonization of SNOMED CT with selected other Unified Medical Language System (UMLS) terminologies.
We first identified a subset of English terminologies in the UMLS that have 'PAR' relationship labeled with 'IS_A' and over 10% overlap with one or more of the 19 hierarchies of SNOMED CT. We call these "reference terminologies" and we note that our use of this name is different from the standard use. Next, we defined a set of topological patterns across pairs of terminologies, with SNOMED CT being one terminology in each pair and the other being one of the reference terminologies. We then explored how often these topological patterns appear between SNOMED CT and each reference terminology, and how to interpret them.
Four viable reference terminologies were identified. Large density differences between terminologies were found. Expected interpretations of these differences were indeed observed, as follows. A random sample of 299 instances of special topological patterns ("2:3 and 3:2 trapezoids") showed that 39.1% and 59.5% of analyzed concepts in SNOMED CT and in a reference terminology, respectively, were deemed to be alternative classifications of the same conceptual content. In 30.5% and 17.6% of the cases, it was found that intermediate concepts could be imported into SNOMED CT or into the reference terminology, respectively, to enhance their conceptual content, if approved by a human curator. Other cases included synonymy and errors in one of the terminologies.
These results show that structure-based algorithmic methods can be used to identify potential concepts to enrich SNOMED CT and the four reference terminologies. The comparative analysis has the future potential of supporting terminology authoring by suggesting new content to improve content coverage and semantic harmonization between terminologies.
医学术语所代表的概念信息量(即“密度”)各不相同,即使在同一子领域也是如此。这在术语映射、语义协调和术语整合方面引发了问题。此外,复杂的临床场景需要用内容全面的医学术语进行编码。据报道,领先的临床术语系统SNOMED临床术语(SNOMED CT)缺乏概念和同义词,这些问题无法通过后置协调完全解决。因此,需要一种可扩展的解决方案来丰富SNOMED CT的概念内容。我们正在开发一种基于结构的算法方法,以识别潜在概念,丰富SNOMED CT的概念内容,并支持SNOMED CT与选定的其他统一医学语言系统(UMLS)术语进行语义协调。
我们首先在UMLS中确定了一组英文术语子集,这些术语具有标有“IS_A”的“PAR”关系,并且与SNOMED CT的19个层次结构中的一个或多个有超过10%的重叠。我们将这些称为“参考术语”,并指出我们对这个名称的使用与标准用法不同。接下来,我们定义了一组术语对之间的拓扑模式,在每对术语中,SNOMED CT是其中一个术语,另一个是参考术语之一。然后,我们探究了这些拓扑模式在SNOMED CT与每个参考术语之间出现的频率以及如何进行解释。
识别出了四个可行的参考术语。发现术语之间存在较大的密度差异。确实观察到了对这些差异的预期解释,如下所述。对299个特殊拓扑模式实例(“2:3和3:2梯形”)的随机样本分析表明,SNOMED CT和一个参考术语中分别有39.1%和59.5%的分析概念被视为同一概念内容的替代分类。在30.5%和17.6%的案例中发现,如果经过人工编目员批准,中间概念可以分别导入SNOMED CT或参考术语中,以增强它们的概念内容。其他情况包括其中一个术语存在同义词和错误。
这些结果表明,基于结构的算法方法可用于识别潜在概念,以丰富SNOMED CT和四个参考术语。这种比较分析未来有可能通过建议新内容来支持术语编辑,以提高内容覆盖范围和术语之间的语义协调性。