Suppr超能文献

使用预训练语言模型从德语出院小结中自动提取12个心血管概念。

Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models.

作者信息

Richter-Pechanski Phillip, Geis Nicolas A, Kiriakou Christina, Schwab Dominic M, Dieterich Christoph

机构信息

Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg, Germany.

Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.

出版信息

Digit Health. 2021 Nov 26;7:20552076211057662. doi: 10.1177/20552076211057662. eCollection 2021 Jan-Dec.

Abstract

OBJECTIVE

A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects.

METHODS

We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field.

RESULTS

Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity).

CONCLUSION

Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages.

摘要

目的

大量医学数据仍存储在非结构化文本文件中。我们提出了一种从心脏病学领域的德语非结构化临床常规数据中自动提取信息的方法,以使这些数据能够用于最新的数据驱动深度学习项目。

方法

我们评估了预训练语言模型,以从德语出院小结中提取一组12个心血管概念。我们比较了在不同语料库上预训练并在心血管概念提取任务上进行微调的三种基于变换器的双向编码器表示,使用海德堡大学医院心脏病专家手动注释的204份出院小结进行微调。我们将结果与基于长短期记忆网络和条件随机场的传统机器学习方法进行了比较。

结果

我们表现最佳的模型基于公开可用的德语预训练变换器模型双向编码器表示,实现了逐词微平均F1分数为86%,比基线至少高出6%。此外,该方法在精度(阳性预测值)和召回率(敏感性)之间实现了最佳平衡。

结论

我们的结果表明,使用预训练语言模型的最新深度学习方法适用于使用有限训练数据进行心血管概念提取的任务。这最大限度地减少了注释工作,而注释工作目前是德语和许多其他欧洲语言在临床领域数据驱动深度学习项目的任何应用的瓶颈。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7222/8637713/6a9399fea5da/10.1177_20552076211057662-fig1.jpg

相似文献

本文引用的文献

2
Medical Information Extraction in the Age of Deep Learning.深度学习时代的医学信息抽取。
Yearb Med Inform. 2020 Aug;29(1):208-220. doi: 10.1055/s-0040-1702001. Epub 2020 Aug 21.
4
9
Enhancing clinical concept extraction with contextual embeddings.利用上下文嵌入增强临床概念提取。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1297-1304. doi: 10.1093/jamia/ocz096.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验