Yale University School of Medicine, USA.
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):602-7. doi: 10.1136/jamia.2009.001057.
To identify challenges in mapping internal International Classification of Disease, 9th edition, Clinical Modification (ICD-9-CM) encoded legacy data to Systematic Nomenclature of Medicine (SNOMED), using SNOMED-prescribed compositional approaches where appropriate, and to explore the mapping coverage provided by the US National Library of Medicine (NLM)'s SNOMED clinical core subset.
This study selected ICD-CM codes that occurred at least 100 times in the organization's problem list or diagnosis data in 2008. After eliminating codes whose exact mappings were already available in UMLS, the remainder were mapped manually with software assistance.
Of the 2194 codes, 784 (35.7%) required manual mapping. 435 of these represented concept types documented in SNOMED as deprecated: these included the qualifying phrases such as 'not elsewhere classified'. A third of the codes were composite, requiring multiple SNOMED code to map. Representing 45 composite concepts required introducing disjunction ('or') or set-difference ('without') operators, which are not currently defined in SNOMED. Only 47% of the concepts required for composition were present in the clinical core subset. Search of SNOMED for the correct concepts often required extensive application of knowledge of both English and medical synonymy.
Strategies to deal with legacy ICD data must address the issue of codes created by non-taxonomist users. The NLM core subset possibly needs augmentation with concepts from certain SNOMED hierarchies, notably qualifiers, body structures, substances/products and organisms. Concept-matching software needs to utilize query expansion strategies, but these may be effective in production settings only if a large but non-redundant SNOMED subset that minimizes the proportion of extensively pre-coordinated concepts is also available.
确定将内部国际疾病分类第 9 版临床修订版(ICD-9-CM)编码的遗留数据映射到系统命名法医学(SNOMED)时所面临的挑战,在适当的情况下使用 SNOMED 规定的组合方法,并探讨美国国家医学图书馆(NLM)的 SNOMED 临床核心子集提供的映射覆盖范围。
本研究选择了 2008 年在该组织的问题清单或诊断数据中至少出现 1000 次的 ICD-CM 代码。在排除 UMLS 中已经有精确映射的代码后,其余代码通过软件辅助进行手动映射。
在 2194 个代码中,有 784 个(35.7%)需要手动映射。其中 435 个代表 SNOMED 中记录为已弃用的概念类型:这些概念包括“未在其他地方分类”等限定短语。三分之一的代码是复合的,需要多个 SNOMED 代码进行映射。代表 45 个复合概念的代码需要引入析取(“或”)或集合差(“没有”)运算符,这些运算符目前在 SNOMED 中没有定义。组合所需的概念中只有 47%存在于临床核心子集中。在 SNOMED 中搜索正确的概念通常需要广泛应用英语和医学同义词的知识。
处理遗留 ICD 数据的策略必须解决由非分类学家用户创建的代码问题。NLM 核心子集可能需要从某些 SNOMED 层次结构中添加概念,特别是限定词、身体结构、物质/产品和生物体。概念匹配软件需要利用查询扩展策略,但只有在提供一个非冗余的但不减少的 SNOMED 子集的情况下,该策略才能在生产环境中有效,该子集最大限度地减少了广泛预先协调的概念的比例。