Chen Jinying, Jagannatha Abhyuday N, Fodeh Samah J, Yu Hong
Department of Quantitative Health Sicences, University of Massachusetts Medical School, Worcester, MA, United States.
School of Computer Science, University of Massachusetts, Amherst, MA, United States.
JMIR Med Inform. 2017 Oct 31;5(4):e42. doi: 10.2196/medinform.8531.
BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P<.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS's performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS's performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.
背景:医学术语是患者理解其电子健康记录(EHR)笔记的主要障碍。将EHR术语与通俗易懂的术语或定义相链接的临床自然语言处理(NLP)系统,能让患者在阅读EHR笔记时轻松获取有用信息,且已证明可提高患者对EHR的理解。然而,公共领域中高质量的EHR术语通俗语言资源非常有限。由于扩展和整理这样的资源成本高昂,首先识别对患者EHR理解重要的术语是有益且必要的。 目的:我们旨在开发一种名为适应性远程监督(ADS)的NLP系统,对从EHR语料库中挖掘出的候选术语进行排序。我们将给予ADS排序为高的EHR术语更高的通俗语言注释优先级,即,为这些术语创建通俗定义。 方法:适应性远程监督利用来自消费者健康词汇的远程监督和迁移学习来调整自身,以解决在目标领域对EHR术语进行排序的问题。我们研究了2种先进的迁移学习算法(即,特征空间增强和监督远程监督),并设计了5种学习特征类型,包括从大量EHR数据中学习到的分布式词表示用于ADS。为了评估ADS,我们请领域专家将6038个候选术语标注为对EHR理解重要或不重要。然后我们将这些数据随机分为目标领域训练数据(1000个示例)和评估数据(5038个示例)。我们在评估数据上,将ADS与2个强大的基线进行比较,包括标准监督学习。 结果:当使用1000个目标领域训练示例时,使用特征空间增强的ADS系统在评估集上取得了最佳平均精度,为0.850。当仅使用100个目标领域训练示例时,使用监督远程监督的ADS系统在评估集上取得了最佳平均精度,为0.819。这2个ADS系统在所有指标和所有条件下的表现均显著优于基线系统(所有测量指标和所有条件下P< .001)。使用丰富的学习特征集对ADS的性能有很大贡献。 结论:ADS可以有效地对从EHR中挖掘出的术语进行排序。即使只有少量目标领域训练示例,迁移学习也提高了ADS的性能。ADS优先排序的EHR术语被用于扩展支持患者EHR理解的通俗语言资源。可根据要求提供ADS排序的前10000个EHR术语。
J Biomed Inform. 2018-9-12
BMC Med Inform Decis Mak. 2021-11-9
BMC Med Inform Decis Mak. 2023-9-18
Health Policy Technol. 2024-12
Proc Conf Empir Methods Nat Lang Process. 2022-12
Healthcare (Basel). 2021-1-21
BMC Med Inform Decis Mak. 2019-1-7
J Med Internet Res. 2017-6-19
J Mach Learn Res. 2016
J Med Internet Res. 2016-10-4
J Biomed Semantics. 2016-9-26
J Med Internet Res. 2015-6-23
J Am Med Inform Assoc. 2015-9
J Med Internet Res. 2015-5-7