Kate Rohit J
Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, United States.
JMIR Med Inform. 2021 Jan 14;9(1):e23104. doi: 10.2196/23104.
Clinical terms mentioned in clinical text are often not in their standardized forms as listed in clinical terminologies because of linguistic and stylistic variations. However, many automated downstream applications require clinical terms mapped to their corresponding concepts in clinical terminologies, thus necessitating the task of clinical term normalization.
In this paper, a system for clinical term normalization is presented that utilizes edit patterns to convert clinical terms into their normalized forms.
The edit patterns are automatically learned from the Unified Medical Language System (UMLS) Metathesaurus as well as from the given training data. The edit patterns are generalized sequences of edits that are derived from edit distance computations. The edit patterns are both character based as well as word based and are learned separately for different semantic types. In addition to these edit patterns, the system also normalizes clinical terms through the subconcepts mentioned within them.
The system was evaluated as part of the 2019 n2c2 Track 3 shared task of clinical term normalization. It obtained 80.79% accuracy on the standard test data. This paper includes ablation studies to evaluate the contributions of different components of the system. A challenging part of the task was disambiguation when a clinical term could be normalized to multiple concepts.
The learned edit patterns led the system to perform well on the normalization task. Given that the system is based on patterns, it is human interpretable and is also capable of giving insights about common variations of clinical terms mentioned in clinical text that are different from their standardized forms.
由于语言和文体的变化,临床文本中提及的临床术语往往并非临床术语表中列出的标准化形式。然而,许多自动化的下游应用需要将临床术语映射到临床术语表中的相应概念,因此需要进行临床术语规范化任务。
本文提出一种临床术语规范化系统,该系统利用编辑模式将临床术语转换为其规范化形式。
编辑模式是从统一医学语言系统(UMLS)元词表以及给定的训练数据中自动学习得到的。编辑模式是从编辑距离计算中派生出来的编辑的广义序列。编辑模式既有基于字符的,也有基于单词的,并且针对不同的语义类型分别进行学习。除了这些编辑模式外,该系统还通过临床术语中提到的子概念对临床术语进行规范化。
该系统作为2019年n2c2临床术语规范化共享任务第3赛道的一部分进行了评估。在标准测试数据上,它获得了80.79%的准确率。本文包括消融研究,以评估系统不同组件的贡献。当一个临床术语可以规范化为多个概念时,任务中一个具有挑战性的部分是消歧。
所学习的编辑模式使系统在规范化任务中表现良好。鉴于该系统基于模式,它具有人类可解释性,并且还能够深入了解临床文本中提到的与标准化形式不同的临床术语的常见变体。