• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MedJEx:一种具有维基百科超链接跨度和上下文掩码语言模型评分的医学术语提取模型。

MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score.

作者信息

Kwon Sunjae, Yao Zonghai, Jordan Harmon S, Levy David A, Corner Brian, Yu Hong

机构信息

UMass Amherst.

Health Research Consultant.

出版信息

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:11733-11751.

PMID:37103473
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10129059/
Abstract

This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences (). Then, we introduce a novel medical jargon extraction () model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our results show that training on the auxiliary Wikipedia hyperlink span datasets improved six out of eight biomedical named entity recognition benchmark datasets. Both MedJ and MedJEx are publicly available.

摘要

本文提出了一种新的自然语言处理(NLP)应用程序,用于从电子健康记录(EHR)笔记中识别患者可能难以理解的医学术语。我们首先展示了一个新颖的、公开可用的数据集,其中包含来自18000多个EHR笔记句子的专家注释医学术语。然后,我们引入了一种新颖的医学术语提取模型,该模型已被证明优于现有的最先进NLP模型。首先,MedJEx在辅助维基百科超链接跨度数据集上进行训练时提高了整体性能,其中超链接跨度提供了额外的维基百科文章来解释这些跨度(或术语),然后在注释后的MedJ数据上进行微调。其次,我们发现上下文掩码语言模型分数有助于检测特定领域的不熟悉术语。此外,我们的结果表明,在辅助维基百科超链接跨度数据集上进行训练改进了八个生物医学命名实体识别基准数据集中的六个。MedJ和MedJEx均可公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/1682b31aa11d/nihms-1843448-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/c61568eb7338/nihms-1843448-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/5b43389c82d0/nihms-1843448-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/a26603b84892/nihms-1843448-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/3d8d1818503c/nihms-1843448-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/08ea6d131db8/nihms-1843448-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/1682b31aa11d/nihms-1843448-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/c61568eb7338/nihms-1843448-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/5b43389c82d0/nihms-1843448-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/a26603b84892/nihms-1843448-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/3d8d1818503c/nihms-1843448-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/08ea6d131db8/nihms-1843448-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43dc/10129059/1682b31aa11d/nihms-1843448-f0004.jpg

相似文献

1
MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score.MedJEx:一种具有维基百科超链接跨度和上下文掩码语言模型评分的医学术语提取模型。
Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:11733-11751.
2
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
3
Finding Important Terms for Patients in Their Electronic Health Records: A Learning-to-Rank Approach Using Expert Annotations.在患者电子健康记录中查找重要术语:一种使用专家注释的排序学习方法。
JMIR Med Inform. 2016 Nov 30;4(4):e40. doi: 10.2196/medinform.6373.
4
Unsupervised ensemble ranking of terms in electronic health record notes based on their importance to patients.基于术语对患者的重要性对电子健康记录笔记中的术语进行无监督集成排序。
J Biomed Inform. 2017 Apr;68:121-131. doi: 10.1016/j.jbi.2017.02.016. Epub 2017 Mar 4.
5
Evaluating Expert-Layperson Agreement in Identifying Jargon Terms in Electronic Health Record Notes: Observational Study.评估电子健康记录中的行话术语识别中的专家-非专业人士一致性:观察性研究。
J Med Internet Res. 2024 Oct 15;26:e49704. doi: 10.2196/49704.
6
Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT.基于领域特定的 ALBERT 进行生物医学自然语言处理任务的基准测试。
BMC Bioinformatics. 2022 Apr 21;23(1):144. doi: 10.1186/s12859-022-04688-w.
7
Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach.对医学术语进行排序以支持扩展用于患者理解电子健康记录笔记的通俗语言资源:适应性远程监督方法。
JMIR Med Inform. 2017 Oct 31;5(4):e42. doi: 10.2196/medinform.8531.
8
Contextualized medication event extraction with striding NER and multi-turn QA.基于滑动命名实体识别和多轮问答的语境化用药事件抽取。
J Biomed Inform. 2023 Aug;144:104416. doi: 10.1016/j.jbi.2023.104416. Epub 2023 Jun 13.
9
Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.从非结构化临床记录中提取症状的任务定义、标注数据集和监督自然语言处理模型。
J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.
10
Biomedical and clinical English model packages for the Stanza Python NLP library.适用于Stanza Python自然语言处理库的生物医学和临床英语模型包。
J Am Med Inform Assoc. 2021 Aug 13;28(9):1892-1899. doi: 10.1093/jamia/ocab090.

引用本文的文献

1
MedReadCtrl: Personalizing medical text generation with readability-controlled instruction learning.MedReadCtrl:通过可读性控制的指令学习实现医学文本生成个性化
medRxiv. 2025 Jul 11:2025.07.09.25331239. doi: 10.1101/2025.07.09.25331239.
2
MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain.MedReadMe:医学领域细粒度句子可读性的系统研究。
Proc Conf Empir Methods Nat Lang Process. 2024 Nov;2024:17293-17319. doi: 10.18653/v1/2024.emnlp-main.958.
3
ODD: A Benchmark Dataset for the Natural Language Processing Based Opioid Related Aberrant Behavior Detection.

本文引用的文献

1
SPARClink: an interactive tool to visualize the impact of the SPARC program.SPARClink:一个用于可视化 SPARC 计划影响的交互式工具。
F1000Res. 2022 Jan 31;11:124. doi: 10.12688/f1000research.75071.1. eCollection 2022.
2
Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit.多领域临床自然语言处理与 MedCAT:医学概念标注工具包。
Artif Intell Med. 2021 Jul;117:102083. doi: 10.1016/j.artmed.2021.102083. Epub 2021 May 1.
3
Evaluating the Effectiveness of NoteAid in a Community Hospital Setting: Randomized Trial of Electronic Health Record Note Comprehension Interventions With Patients.
ODD:用于基于自然语言处理的阿片类药物相关异常行为检测的基准数据集。
Proc Conf. 2024 Jun;2024:4338-4359.
4
Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing.基于提示的生物医学知识探测的预训练语言模型的上下文方差评估
AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:592-601. eCollection 2023.
5
Automated identification of eviction status from electronic health record notes.从电子健康记录中自动识别驱逐状态。
J Am Med Inform Assoc. 2023 Jul 19;30(8):1429-1437. doi: 10.1093/jamia/ocad081.
评估 NoteAid 在社区医院环境中的有效性:一项针对电子健康记录笔记理解干预措施的随机试验,对象为患者。
J Med Internet Res. 2021 May 13;23(5):e26354. doi: 10.2196/26354.
4
Self-Diagnosis through AI-enabled Chatbot-based Symptom Checkers: User Experiences and Design Considerations.通过基于 AI 的聊天机器人的自我诊断:用户体验和设计注意事项。
AMIA Annu Symp Proc. 2021 Jan 25;2020:1354-1363. eCollection 2020.
5
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
6
Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0).从电子健康记录中提取药物、适应症和药物不良事件的自然语言处理挑战赛概述(MADE 1.0)。
Drug Saf. 2019 Jan;42(1):99-111. doi: 10.1007/s40264-018-0762-z.
7
Training to Improve Communication Quality: An Efficient Interdisciplinary Experience for Emergency Department Clinicians.提升沟通质量的培训:急诊科临床医生的高效跨学科体验
Am J Med Qual. 2019 May/Jun;34(3):260-265. doi: 10.1177/1062860618799936. Epub 2018 Sep 21.
8
A Natural Language Processing System That Links Medical Terms in Electronic Health Record Notes to Lay Definitions: System Development Using Physician Reviews.一种将电子健康记录笔记中的医学术语与通俗定义相链接的自然语言处理系统:利用医生评审进行系统开发。
J Med Internet Res. 2018 Jan 22;20(1):e26. doi: 10.2196/jmir.8669.
9
Text Simplification Using Consumer Health Vocabulary to Generate Patient-Centered Radiology Reporting: Translation and Evaluation.使用消费者健康词汇进行文本简化以生成以患者为中心的放射学报告:翻译与评估
J Med Internet Res. 2017 Dec 18;19(12):e417. doi: 10.2196/jmir.8536.
10
Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach.对医学术语进行排序以支持扩展用于患者理解电子健康记录笔记的通俗语言资源:适应性远程监督方法。
JMIR Med Inform. 2017 Oct 31;5(4):e42. doi: 10.2196/medinform.8531.