Chen Fangyi, Zhang Gongbo, Chen Si, Callahan Tiffany, Weng Chunhua
Department of Biomedical Informatics, Columbia University, New York, NY, USA.
AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:515-524. eCollection 2024.
Clinical notes are full of ambiguous medical abbreviations. Contextual knowledge has been leveraged by recent learning-based approaches for sense disambiguation. Previous findings indicated that structural elements of clinical notes entail useful characteristics for informing different interpretations of abbreviations, yet they have remained underutilized and have not been fully investigated. To our best knowledge, the only study exploring note structures simply enumerated the headers in the notes, where such representations are not semantically meaningful. This paper describes a learning-based approach using the note structure represented by the semantic types predefined in Unified Medical Language System (UMLS). We evaluated the representation in addition to the widely used N-gram with three learning models on two different datasets. Experiments indicate that our feature augmentation consistently improved model performance for abbreviation disambiguation, with the optimal F1 score of 0.93.
临床记录中充斥着含义模糊的医学缩写。基于学习的方法利用上下文知识来消除歧义。先前的研究结果表明,临床记录的结构元素具有有助于对缩写进行不同解释的有用特征,但这些特征一直未得到充分利用,也未得到全面研究。据我们所知,唯一一项探索记录结构的研究只是简单地列举了记录中的标题,而这种表示在语义上并无意义。本文描述了一种基于学习的方法,该方法使用统一医学语言系统(UMLS)中预定义的语义类型所表示的记录结构。我们使用三种学习模型在两个不同的数据集上,除了广泛使用的N元语法之外,还对这种表示进行了评估。实验表明,我们的特征增强方法持续提高了缩写消除歧义模型的性能,最佳F1分数达到了0.93。