Suppr
超能文献

基于变压器的德国临床记录人工智能模型的批判性评估。

Critical assessment of transformer-based AI models for German clinical notes.

作者信息

Lentzen Manuel, Madan Sumit, Lage-Rupprecht Vanessa, Kühnel Lisa, Fluck Juliane, Jacobs Marc, Mittermaier Mirja, Witzenrath Martin, Brunecker Peter, Hofmann-Apitius Martin, Weber Joachim, Fröhlich Holger

机构信息

Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.

Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, Germany.

出版信息

JAMIA Open. 2022 Nov 15;5(4):ooac087. doi: 10.1093/jamiaopen/ooac087. eCollection 2022 Dec.

DOI:10.1093/jamiaopen/ooac087

PMID:36380848

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9663939/

Abstract

OBJECTIVE

Healthcare data such as clinical notes are primarily recorded in an unstructured manner. If adequately translated into structured data, they can be utilized for health economics and set the groundwork for better individualized patient care. To structure clinical notes, deep-learning methods, particularly transformer-based models like , have recently received much attention. Currently, biomedical applications are primarily focused on the English language. While general-purpose German-language models such as GermanBERT and GottBERT have been published, adaptations for biomedical data are unavailable. This study evaluated the suitability of existing and novel transformer-based models for the German biomedical and clinical domain.

MATERIALS AND METHODS

We used 8 transformer-based models and pre-trained 3 new models on a newly generated biomedical corpus, and systematically compared them with each other. We annotated a new dataset of clinical notes and used it with 4 other corpora (BRONCO150, CLEF eHealth 2019 Task 1, GGPONC, and JSynCC) to perform named entity recognition (NER) and document classification tasks.

RESULTS

General-purpose language models can be used effectively for biomedical and clinical natural language processing (NLP) tasks, still, our newly trained BioGottBERT model outperformed GottBERT on both clinical NER tasks. However, training new biomedical models from scratch proved ineffective.

DISCUSSION

The domain-adaptation strategy's potential is currently limited due to a lack of pre-training data. Since general-purpose language models are only marginally inferior to domain-specific models, both options are suitable for developing German-language biomedical applications.

CONCLUSION

General-purpose language models perform remarkably well on biomedical and clinical NLP tasks. If larger corpora become available in the future, domain-adapting these models may improve performances.

摘要

目的

诸如临床记录等医疗保健数据主要以非结构化方式记录。如果能充分转化为结构化数据，它们可用于卫生经济学，并为更好的个性化患者护理奠定基础。为了构建临床记录的结构，深度学习方法，特别是像基于Transformer的模型，最近受到了广泛关注。目前，生物医学应用主要集中在英语语言上。虽然已经发布了诸如GermanBERT和GottBERT等通用德语模型，但尚无针对生物医学数据的改编版本。本研究评估了现有和新型基于Transformer的模型在德国生物医学和临床领域的适用性。

材料和方法

我们使用了8个基于Transformer的模型，并在新生成的生物医学语料库上预训练了3个新模型，并对它们进行了系统的相互比较。我们注释了一个新的临床记录数据集，并将其与其他4个语料库（BRONCO150、CLEF eHealth 2019任务1、GGPONC和JSynCC）一起用于执行命名实体识别（NER）和文档分类任务。

结果

通用语言模型可有效地用于生物医学和临床自然语言处理（NLP）任务，尽管如此，我们新训练的BioGottBERT模型在两项临床NER任务上均优于GottBERT。然而，从头开始训练新的生物医学模型被证明是无效的。

讨论

由于缺乏预训练数据，目前领域适应策略的潜力有限。由于通用语言模型仅略逊于特定领域模型，这两种选择都适用于开发德语生物医学应用程序。

结论

通用语言模型在生物医学和临床NLP任务上表现出色。如果未来有更大的语料库可用，对这些模型进行领域适应可能会提高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c69/9663939/58ebce408fe2/ooac087f1.jpg

相似文献

Critical assessment of transformer-based AI models for German clinical notes.

JAMIA Open. 2022 Nov 15;5(4):ooac087. doi: 10.1093/jamiaopen/ooac087. eCollection 2022 Dec.

Oversampling effect in pretraining for bidirectional encoder representations from transformers (BERT) to localize medical BERT and enhance biomedical BERT.

Artif Intell Med. 2024 Jul;153:102889. doi: 10.1016/j.artmed.2024.102889. Epub 2024 May 5.

Transformers-sklearn: a toolkit for medical language understanding with transformer-based models.

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):90. doi: 10.1186/s12911-021-01459-0.

Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.

JMIR Med Inform. 2019 Sep 12;7(3):e14830. doi: 10.2196/14830.

Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study.

JMIR AI. 2023 May 4;2:e44293. doi: 10.2196/44293.

Clinical concept extraction using transformers.

J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942. doi: 10.1093/jamia/ocaa189.

Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models.

JMIR Med Inform. 2020 Nov 23;8(11):e19735. doi: 10.2196/19735.

Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT.

BMC Bioinformatics. 2022 Apr 21;23(1):144. doi: 10.1186/s12859-022-04688-w.

A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora.

BMC Med Inform Decis Mak. 2022 Sep 6;22(Suppl 3):235. doi: 10.1186/s12911-022-01967-7.

Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models.

Digit Health. 2021 Nov 26;7:20552076211057662. doi: 10.1177/20552076211057662. eCollection 2021 Jan-Dec.

引用本文的文献

Can open source large language models be used for tumor documentation in Germany?-An evaluation on urological doctors' notes.

BioData Min. 2025 Jul 24;18(1):48. doi: 10.1186/s13040-025-00463-8.

A novel dual embedding few-shot learning approach for classifying bone loss using orthopantomogram radiographic notes.

Head Face Med. 2025 Jul 11;21(1):49. doi: 10.1186/s13005-025-00528-3.

Clinical document corpora-real ones, translated and synthetic substitutes, and assorted domain proxies: a survey of diversity in corpus design, with focus on German text data.

JAMIA Open. 2025 May 14;8(3):ooaf024. doi: 10.1093/jamiaopen/ooaf024. eCollection 2025 Jun.

Beyond digital twins: the role of foundation models in enhancing the interpretability of multiomics modalities in precision medicine.

FEBS Open Bio. 2025 Aug;15(8):1192-1208. doi: 10.1002/2211-5463.70003. Epub 2025 Feb 24.

Evaluating the Acceptance and Usability of an Independent, Noncommercial Search Engine for Medical Information: Cross-Sectional Questionnaire Study and User Behavior Tracking Analysis.

JMIR Hum Factors. 2025 Jan 23;12:e56941. doi: 10.2196/56941.

xMEN: a modular toolkit for cross-lingual medical entity normalization.

JAMIA Open. 2024 Dec 26;8(1):ooae147. doi: 10.1093/jamiaopen/ooae147. eCollection 2025 Feb.

The aluminum standard: using generative Artificial Intelligence tools to synthesize and annotate non-structured patient data.

BMC Med Inform Decis Mak. 2024 Dec 28;24(1):409. doi: 10.1186/s12911-024-02825-4.

Task-Specific Transformer-Based Language Models in Health Care: Scoping Review.

JMIR Med Inform. 2024 Nov 18;12:e49724. doi: 10.2196/49724.

Dataset of miRNA-disease relations extracted from textual data using transformer-based neural networks.

Database (Oxford). 2024 Aug 5;2024. doi: 10.1093/database/baae066.

Transformer models in biomedicine.

BMC Med Inform Decis Mak. 2024 Jul 29;24(1):214. doi: 10.1186/s12911-024-02600-5.

本文引用的文献

Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models.

Digit Health. 2021 Nov 26;7:20552076211057662. doi: 10.1177/20552076211057662. eCollection 2021 Jan-Dec.

Annotation and initial evaluation of a large annotated German oncological corpus.

JAMIA Open. 2021 Apr 19;4(2):ooab025. doi: 10.1093/jamiaopen/ooab025. eCollection 2021 Apr.

Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports.

Bioinformatics. 2021 Jan 29;36(21):5255-5261. doi: 10.1093/bioinformatics/btaa668.

Deep learning in clinical natural language processing: a methodical review.

J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470. doi: 10.1093/jamia/ocz200.

2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records.

J Am Med Inform Assoc. 2020 Jan 1;27(1):3-12. doi: 10.1093/jamia/ocz166.

Cohort selection for clinical trials: n2c2 2018 shared task track 1.

J Am Med Inform Assoc. 2019 Nov 1;26(11):1163-1171. doi: 10.1093/jamia/ocz163.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Clinical Natural Language Processing in languages other than English: opportunities and challenges.

J Biomed Semantics. 2018 Mar 30;9(1):12. doi: 10.1186/s13326-018-0179-8.

LIVIVO - the Vertical Search Engine for Life Sciences.

Datenbank Spektrum. 2017;17(1):29-34. doi: 10.1007/s13222-016-0245-2. Epub 2017 Jan 18.

CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.

J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336. doi: 10.1093/jamia/ocx132.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

基于变压器的德国临床记录人工智能模型的批判性评估。

Critical assessment of transformer-based AI models for German clinical notes.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料和方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译