Suppr超能文献

从临床叙述中提取患者家族病史:使用深度学习模型探索端到端解决方案

Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models.

作者信息

Yang Xi, Zhang Hansi, He Xing, Bian Jiang, Wu Yonghui

机构信息

Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States.

Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, United States.

出版信息

JMIR Med Inform. 2020 Dec 15;8(12):e22982. doi: 10.2196/22982.

Abstract

BACKGROUND

Patients' family history (FH) is a critical risk factor associated with numerous diseases. However, FH information is not well captured in the structured database but often documented in clinical narratives. Natural language processing (NLP) is the key technology to extract patients' FH from clinical narratives. In 2019, the National NLP Clinical Challenge (n2c2) organized shared tasks to solicit NLP methods for FH information extraction.

OBJECTIVE

This study presents our end-to-end FH extraction system developed during the 2019 n2c2 open shared task as well as the new transformer-based models that we developed after the challenge. We seek to develop a machine learning-based solution for FH information extraction without task-specific rules created by hand.

METHODS

We developed deep learning-based systems for FH concept extraction and relation identification. We explored deep learning models including long short-term memory-conditional random fields and bidirectional encoder representations from transformers (BERT) as well as developed ensemble models using a majority voting strategy. To further optimize performance, we systematically compared 3 different strategies to use BERT output representations for relation identification.

RESULTS

Our system was among the top-ranked systems (3 out of 21) in the challenge. Our best system achieved micro-averaged F1 scores of 0.7944 and 0.6544 for concept extraction and relation identification, respectively. After challenge, we further explored new transformer-based models and improved the performances of both subtasks to 0.8249 and 0.6775, respectively. For relation identification, our system achieved a performance comparable to the best system (0.6810) reported in the challenge.

CONCLUSIONS

This study demonstrated the feasibility of utilizing deep learning methods to extract FH information from clinical narratives.

摘要

背景

患者家族史(FH)是与多种疾病相关的关键风险因素。然而,FH信息在结构化数据库中并未得到很好的记录,而是常常记录在临床叙述中。自然语言处理(NLP)是从临床叙述中提取患者FH的关键技术。2019年,国家NLP临床挑战赛(n2c2)组织了共享任务,以征集用于FH信息提取的NLP方法。

目的

本研究展示了我们在2019年n2c2开放共享任务期间开发的端到端FH提取系统,以及我们在挑战赛之后开发的基于新型变换器的模型。我们寻求开发一种基于机器学习的解决方案,用于FH信息提取,而无需手工创建特定于任务的规则。

方法

我们开发了基于深度学习的系统,用于FH概念提取和关系识别。我们探索了深度学习模型,包括长短期记忆条件随机场和变换器双向编码器表征(BERT),并使用多数投票策略开发了集成模型。为了进一步优化性能,我们系统地比较了3种不同的策略,以使用BERT输出表征进行关系识别。

结果

我们的系统在挑战赛中位列顶级系统(21个中的第3名)。我们的最佳系统在概念提取和关系识别方面分别取得了0.7944和0.6544的微平均F1分数。挑战赛之后,我们进一步探索了基于新型变换器的模型,并将两个子任务的性能分别提高到了0.8249和0.6775。对于关系识别,我们的系统取得了与挑战赛中报告的最佳系统(0.6810)相当的性能。

结论

本研究证明了利用深度学习方法从临床叙述中提取FH信息的可行性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dd57/7772072/3d116b7de642/medinform_v8i12e22982_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验