• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

德国医疗保健领域中开源大型语言模型用于临床文档记录的可行性:真实世界模型评估研究

Viability of Open Large Language Models for Clinical Documentation in German Health Care: Real-World Model Evaluation Study.

作者信息

Heilmeyer Felix, Böhringer Daniel, Reinhard Thomas, Arens Sebastian, Lyssenko Lisa, Haverkamp Christian

机构信息

Institute of Digitalization in Medicine, Faculty of Medicine and Medical Center, University of Freiburg, Breisacher Straße 153, Freiburg im Breisgau, 79110, Germany, 49 27039392.

Eye Center, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg im Breisgau, Germany.

出版信息

JMIR Med Inform. 2024 Aug 28;12:e59617. doi: 10.2196/59617.

DOI:10.2196/59617
PMID:39195570
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11373371/
Abstract

BACKGROUND

The use of large language models (LLMs) as writing assistance for medical professionals is a promising approach to reduce the time required for documentation, but there may be practical, ethical, and legal challenges in many jurisdictions complicating the use of the most powerful commercial LLM solutions.

OBJECTIVE

In this study, we assessed the feasibility of using nonproprietary LLMs of the GPT variety as writing assistance for medical professionals in an on-premise setting with restricted compute resources, generating German medical text.

METHODS

We trained four 7-billion-parameter models with 3 different architectures for our task and evaluated their performance using a powerful commercial LLM, namely Anthropic's Claude-v2, as a rater. Based on this, we selected the best-performing model and evaluated its practical usability with 2 independent human raters on real-world data.

RESULTS

In the automated evaluation with Claude-v2, BLOOM-CLP-German, a model trained from scratch on the German text, achieved the best results. In the manual evaluation by human experts, 95 (93.1%) of the 102 reports generated by that model were evaluated as usable as is or with only minor changes by both human raters.

CONCLUSIONS

The results show that even with restricted compute resources, it is possible to generate medical texts that are suitable for documentation in routine clinical practice. However, the target language should be considered in the model selection when processing non-English text.

摘要

背景

使用大语言模型(LLMs)作为医学专业人员的写作辅助工具是一种有望减少文档撰写所需时间的方法,但在许多司法管辖区可能存在实际、伦理和法律挑战,使最强大的商业大语言模型解决方案的使用变得复杂。

目的

在本研究中,我们评估了在计算资源受限的本地环境中,使用GPT系列的非专有大语言模型作为医学专业人员的写作辅助工具以生成德语医学文本的可行性。

方法

我们针对我们的任务训练了四个具有3种不同架构的70亿参数模型,并使用一个强大的商业大语言模型,即Anthropic公司的Claude-v2作为评估者来评估它们的性能。基于此,我们选择了性能最佳的模型,并与2名独立的人类评估者一起在真实世界数据上评估其实际可用性。

结果

在使用Claude-v2进行的自动评估中,从零开始在德语文本上训练的模型BLOOM-CLP-德语取得了最佳结果。在人类专家的人工评估中,该模型生成的102份报告中有95份(93.1%)被两名人类评估者评为原样可用或只需进行微小修改即可使用。

结论

结果表明,即使计算资源受限,也有可能生成适用于常规临床实践文档记录的医学文本。然而,在处理非英语文本时,模型选择应考虑目标语言。

相似文献

1
Viability of Open Large Language Models for Clinical Documentation in German Health Care: Real-World Model Evaluation Study.德国医疗保健领域中开源大型语言模型用于临床文档记录的可行性:真实世界模型评估研究
JMIR Med Inform. 2024 Aug 28;12:e59617. doi: 10.2196/59617.
2
Large language models are changing landscape of academic publications. A positive transformation?大型语言模型正在改变学术出版格局。这是积极的转变吗?
Cas Lek Cesk. 2024;162(7-8):294-297.
3
Potential of Large Language Models in Health Care: Delphi Study.大语言模型在医疗保健中的潜力:德尔菲研究。
J Med Internet Res. 2024 May 13;26:e52399. doi: 10.2196/52399.
4
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性:使用施瓦茨基本价值观理论的横断面研究。
JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.
5
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
6
Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.大语言模型可通过单一提示实现社交媒体语料库的归纳主题分析:人类验证研究。
JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.
7
Impact of a Digital Scribe System on Clinical Documentation Time and Quality: Usability Study.数字抄写系统对临床文档记录时间和质量的影响:可用性研究
JMIR AI. 2024 Sep 23;3:e60020. doi: 10.2196/60020.
8
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
9
Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.大型语言模型防范生成健康类虚假信息的现行保障措施、风险缓解措施和透明度措施:重复横断面分析。
BMJ. 2024 Mar 20;384:e078538. doi: 10.1136/bmj-2023-078538.
10
Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions.比较流行的大语言模型在国家医学考试委员会样题上的表现。
Cureus. 2024 Mar 11;16(3):e55991. doi: 10.7759/cureus.55991. eCollection 2024 Mar.

引用本文的文献

1
A Pipeline for Automating Emergency Medicine Documentation Using LLMs with Retrieval-Augmented Text Generation.一种使用带有检索增强文本生成功能的大语言模型来自动化急诊医学文档记录的流程。
Appl Artif Intell. 2025 Jun 18;39(1):2519169. doi: 10.1080/08839514.2025.2519169. eCollection 2025.
2
Information Extraction and Summarization for Neurovascular Consultations with GPT-4o: A Clinical Case Study.使用GPT-4o进行神经血管会诊的信息提取与总结:一项临床案例研究。
Clin Neuroradiol. 2025 Jul 31. doi: 10.1007/s00062-025-01538-z.
3
[AI-enabled clinical decision support systems: challenges and opportunities].[人工智能驱动的临床决策支持系统:挑战与机遇]
Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2025 Jun 25. doi: 10.1007/s00103-025-04092-8.
4
Clinical document corpora-real ones, translated and synthetic substitutes, and assorted domain proxies: a survey of diversity in corpus design, with focus on German text data.临床文档语料库——真实语料库、翻译语料库和合成替代语料库,以及各类领域替代语料库:语料库设计多样性调查,重点关注德语文本数据
JAMIA Open. 2025 May 14;8(3):ooaf024. doi: 10.1093/jamiaopen/ooaf024. eCollection 2025 Jun.
5
Automated generation of discharge summaries: leveraging large language models with clinical data.出院小结的自动生成:利用大语言模型结合临床数据
Sci Rep. 2025 May 12;15(1):16466. doi: 10.1038/s41598-025-01618-7.
6
Automated generation of echocardiography reports using artificial intelligence: a novel approach to streamlining cardiovascular diagnostics.使用人工智能自动生成超声心动图报告:一种简化心血管诊断的新方法。
Int J Cardiovasc Imaging. 2025 May;41(5):967-977. doi: 10.1007/s10554-025-03382-1. Epub 2025 Mar 31.

本文引用的文献

1
Holistic Evaluation of Language Models.语言模型的整体评估。
Ann N Y Acad Sci. 2023 Jul;1525(1):140-146. doi: 10.1111/nyas.15007. Epub 2023 May 25.
2
A large language model for electronic health records.用于电子健康记录的大型语言模型。
NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.
3
Medical Text Prediction and Suggestion Using Generative Pretrained Transformer Models with Dental Medical Notes.基于生成式预训练转换器模型和牙科医疗记录的医疗文本预测和建议。
Methods Inf Med. 2022 Dec;61(5-06):195-200. doi: 10.1055/a-1900-7351. Epub 2022 Jul 14.
4
Physician Time Spent Using the Electronic Health Record During Outpatient Encounters: A Descriptive Study.医生在门诊就诊期间使用电子健康记录的时间:一项描述性研究。
Ann Intern Med. 2020 Feb 4;172(3):169-174. doi: 10.7326/M18-3684. Epub 2020 Jan 14.
5
Electronic Health Record Effects on Work-Life Balance and Burnout Within the I Population Collaborative.电子健康记录对I人群协作组工作与生活平衡及职业倦怠的影响。
J Grad Med Educ. 2017 Aug;9(4):479-484. doi: 10.4300/JGME-D-16-00123.1.