Suppr超能文献

MedJEx:一种具有维基百科超链接跨度和上下文掩码语言模型评分的医学术语提取模型。

MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score.

作者信息

Kwon Sunjae, Yao Zonghai, Jordan Harmon S, Levy David A, Corner Brian, Yu Hong

机构信息

UMass Amherst.

Health Research Consultant.

出版信息

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:11733-11751.

Abstract

This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences (). Then, we introduce a novel medical jargon extraction () model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our results show that training on the auxiliary Wikipedia hyperlink span datasets improved six out of eight biomedical named entity recognition benchmark datasets. Both MedJ and MedJEx are publicly available.

摘要

本文提出了一种新的自然语言处理(NLP)应用程序,用于从电子健康记录(EHR)笔记中识别患者可能难以理解的医学术语。我们首先展示了一个新颖的、公开可用的数据集,其中包含来自18000多个EHR笔记句子的专家注释医学术语。然后,我们引入了一种新颖的医学术语提取模型,该模型已被证明优于现有的最先进NLP模型。首先,MedJEx在辅助维基百科超链接跨度数据集上进行训练时提高了整体性能,其中超链接跨度提供了额外的维基百科文章来解释这些跨度(或术语),然后在注释后的MedJ数据上进行微调。其次,我们发现上下文掩码语言模型分数有助于检测特定领域的不熟悉术语。此外,我们的结果表明,在辅助维基百科超链接跨度数据集上进行训练改进了八个生物医学命名实体识别基准数据集中的六个。MedJ和MedJEx均可公开获取。

相似文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验