• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数字抄写员的评估:急诊科会诊电话的对话总结

Evaluation of a Digital Scribe: Conversation Summarization for Emergency Department Consultation Calls.

作者信息

Sezgin Emre, Sirrianni Joseph Winstead, Kranz Kelly

机构信息

Nationwide Children's Hospital, Columbus, United States.

IT Research Innovation - Data Science, Nationwide Children's Hospital, Columbus, United States.

出版信息

Appl Clin Inform. 2024 May 15;15(3):600-11. doi: 10.1055/a-2327-4121.

DOI:10.1055/a-2327-4121
PMID:38749477
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11268986/
Abstract

OBJECTIVE

We present a proof-of-concept digital scribe system as an Emergency Department (ED) consultation call-based clinical conversation summarization pipeline to support clinical documentation, and report its performance.

MATERIALS AND METHODS

We use four pre-trained large language models to establish the digital scribe system: T5-small, T5-base, PEGASUS-PubMed, and BART-Large-CNN via zero-shot and fine-tuning approaches. Our dataset includes 100 referral conversations among ED clinicians and medical records. We report the ROUGE-1, ROUGE-2, and ROUGE-L to compare model performance. In addition, we annotated transcriptions to assess the quality of generated summaries.

RESULTS

The fine-tuned BART-Large-CNN model demonstrates greater performance in summarization tasks with the highest ROUGE scores (F1ROUGE-1=0.49, F1ROUGE-2=0.23, F1ROUGE-L=0.35) scores. In contrast, PEGASUS-PubMed lags notably (F1ROUGE-1=0.28, F1ROUGE-2=0.11, F1ROUGE-L=0.22). BART-Large-CNN's performance decreases by more than 50% with the zero-shot approach. Annotations show that BART-Large-CNN performs 71.4% recall in identifying key information and a 67.7% accuracy rate.

DISCUSSION

The BART-Large-CNN model demonstrates a high level of understanding of clinical dialogue structure, indicated by its performance with and without fine-tuning. Despite some instances of high recall, there is variability in the model's performance, particularly in achieving consistent correctness, suggesting room for refinement. The model's recall ability varies across different information categories.

CONCLUSION

The study provides evidence towards the potential of AI-assisted tools in assisting clinical documentation. Future work is suggested on expanding the research scope with additional language models and hybrid approaches, and comparative analysis to measure documentation burden and human factors.

摘要

目的

我们展示了一个概念验证数字抄写员系统,作为基于急诊科(ED)会诊电话的临床对话总结流程,以支持临床文档记录,并报告其性能。

材料与方法

我们使用四个预训练的大语言模型通过零样本和微调方法建立数字抄写员系统:T5-small、T5-base、PEGASUS-PubMed和BART-Large-CNN。我们的数据集包括急诊科临床医生之间的100次转诊对话和病历。我们报告ROUGE-1、ROUGE-2和ROUGE-L以比较模型性能。此外,我们对转录本进行注释以评估生成总结的质量。

结果

经过微调的BART-Large-CNN模型在总结任务中表现出更高的性能,具有最高的ROUGE分数(F1ROUGE-1=0.49,F1ROUGE-2=0.23,F1ROUGE-L=0.35)。相比之下,PEGASUS-PubMed明显落后(F1ROUGE-1=0.28,F1ROUGE-2=0.11,F1ROUGE-L=0.22)。使用零样本方法时,BART-Large-CNN的性能下降超过50%。注释显示,BART-Large-CNN在识别关键信息方面的召回率为71.4%,准确率为67.7%。

讨论

BART-Large-CNN模型在有无微调的情况下的性能表明,它对临床对话结构有较高的理解水平。尽管有一些高召回率的情况,但模型的性能存在差异,特别是在实现一致的正确性方面,这表明还有改进的空间。模型的召回能力在不同信息类别中有所不同。

结论

该研究为人工智能辅助工具在协助临床文档记录方面的潜力提供了证据。建议未来的工作是通过增加语言模型和混合方法来扩大研究范围,并进行比较分析以衡量文档负担和人为因素。

相似文献

1
Evaluation of a Digital Scribe: Conversation Summarization for Emergency Department Consultation Calls.数字抄写员的评估:急诊科会诊电话的对话总结
Appl Clin Inform. 2024 May 15;15(3):600-11. doi: 10.1055/a-2327-4121.
2
Development and Evaluation of a Digital Scribe: Conversation Summarization Pipeline for Emergency Department Counseling Sessions towards Reducing Documentation Burden.数字书记员的开发与评估:用于急诊科咨询会话的对话摘要流程以减轻文档负担
medRxiv. 2023 Dec 7:2023.12.06.23299573. doi: 10.1101/2023.12.06.23299573.
3
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
4
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
5
Harnessing Moderate-Sized Language Models for Reliable Patient Data Deidentification in Emergency Department Records: Algorithm Development, Validation, and Implementation Study.利用中等规模语言模型对急诊科记录中的患者数据进行可靠去识别:算法开发、验证与实施研究。
JMIR AI. 2025 Apr 1;4:e57828. doi: 10.2196/57828.
6
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
7
A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients.近期大型语言模型在生成肺癌患者出院小结方面的比较研究。
J Biomed Inform. 2025 Aug;168:104867. doi: 10.1016/j.jbi.2025.104867. Epub 2025 Jun 20.
8
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
9
Digital interventions in mental health: evidence syntheses and economic modelling.数字干预在精神健康中的应用:证据综合和经济建模。
Health Technol Assess. 2022 Jan;26(1):1-182. doi: 10.3310/RCTI6942.
10
Sexual Harassment and Prevention Training性骚扰与预防培训

引用本文的文献

1
Performance and improvement strategies for adapting generative large language models for electronic health record applications: A systematic review.将生成式大语言模型应用于电子健康记录的性能及改进策略:一项系统综述
Int J Med Inform. 2025 Aug 28;205:106091. doi: 10.1016/j.ijmedinf.2025.106091.
2
Evaluating the performance of artificial intelligence-based speech recognition for clinical documentation: a systematic review.评估基于人工智能的语音识别在临床文档记录中的性能:一项系统综述。
BMC Med Inform Decis Mak. 2025 Jul 1;25(1):236. doi: 10.1186/s12911-025-03061-0.
3
The Impact of AI Scribes on Streamlining Clinical Documentation: A Systematic Review.人工智能抄写员对简化临床文档的影响:一项系统综述。
Healthcare (Basel). 2025 Jun 16;13(12):1447. doi: 10.3390/healthcare13121447.

本文引用的文献

1
Why do users override alerts? Utilizing large language model to summarize comments and optimize clinical decision support.用户为什么会忽略警报?利用大语言模型总结评论并优化临床决策支持。
J Am Med Inform Assoc. 2024 May 20;31(6):1388-1396. doi: 10.1093/jamia/ocae041.
2
Adapted large language models can outperform medical experts in clinical text summarization.经过改编的大型语言模型在临床文本总结方面的表现优于医学专家。
Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.
3
Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用:开发和可用性研究。
J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.
4
Artificial intelligence in healthcare: Complementing, not replacing, doctors and healthcare providers.医疗保健领域的人工智能:辅助医生和医疗服务提供者,而非取而代之。
Digit Health. 2023 Jul 2;9:20552076231186520. doi: 10.1177/20552076231186520. eCollection 2023 Jan-Dec.
5
Understanding the perceived role of electronic health records and workflow fragmentation on clinician documentation burden in emergency departments.了解电子健康记录和工作流程碎片化对急诊科临床医生文档记录负担的感知作用。
J Am Med Inform Assoc. 2023 Apr 19;30(5):797-808. doi: 10.1093/jamia/ocad038.
6
HPC+ in the medical field: Overview and current examples.医疗领域的高性能计算 (HPC):概述及当前实例。
Technol Health Care. 2023;31(4):1509-1523. doi: 10.3233/THC-229015.
7
Summarizing Patients' Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models.使用预训练的序列到序列模型从医院病程记录中总结患者问题
Proc Int Conf Comput Ling. 2022 Oct;2022:2979-2991.
8
Governance of Clinical AI applications to facilitate safe and equitable deployment in a large health system: Key elements and early successes.临床人工智能应用的治理,以促进在大型医疗系统中安全、公平地部署:关键要素与早期成效。
Front Digit Health. 2022 Aug 24;4:931439. doi: 10.3389/fdgth.2022.931439. eCollection 2022.
9
A Comprehensive Survey of Abstractive Text Summarization Based on Deep Learning.基于深度学习的抽象文本摘要综述
Comput Intell Neurosci. 2022 Aug 1;2022:7132226. doi: 10.1155/2022/7132226. eCollection 2022.
10
Stepwise Design and Evaluation of a Values-Oriented Ambient Intelligence Healthcare Monitoring Platform.基于价值观的环境智能医疗监测平台的逐步设计与评估。
Value Health. 2022 Jun;25(6):914-923. doi: 10.1016/j.jval.2021.11.1372. Epub 2022 Jan 11.