数字抄写员的评估：急诊科会诊电话的对话总结

Evaluation of a Digital Scribe: Conversation Summarization for Emergency Department Consultation Calls.

作者信息

Sezgin Emre, Sirrianni Joseph Winstead, Kranz Kelly

机构信息

Nationwide Children's Hospital, Columbus, United States.

IT Research Innovation - Data Science, Nationwide Children's Hospital, Columbus, United States.

出版信息

Appl Clin Inform. 2024 May 15;15(3):600-11. doi: 10.1055/a-2327-4121.

DOI:10.1055/a-2327-4121

PMID:38749477

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11268986/

Abstract

OBJECTIVE

We present a proof-of-concept digital scribe system as an Emergency Department (ED) consultation call-based clinical conversation summarization pipeline to support clinical documentation, and report its performance.

MATERIALS AND METHODS

We use four pre-trained large language models to establish the digital scribe system: T5-small, T5-base, PEGASUS-PubMed, and BART-Large-CNN via zero-shot and fine-tuning approaches. Our dataset includes 100 referral conversations among ED clinicians and medical records. We report the ROUGE-1, ROUGE-2, and ROUGE-L to compare model performance. In addition, we annotated transcriptions to assess the quality of generated summaries.

RESULTS

The fine-tuned BART-Large-CNN model demonstrates greater performance in summarization tasks with the highest ROUGE scores (F1ROUGE-1=0.49, F1ROUGE-2=0.23, F1ROUGE-L=0.35) scores. In contrast, PEGASUS-PubMed lags notably (F1ROUGE-1=0.28, F1ROUGE-2=0.11, F1ROUGE-L=0.22). BART-Large-CNN's performance decreases by more than 50% with the zero-shot approach. Annotations show that BART-Large-CNN performs 71.4% recall in identifying key information and a 67.7% accuracy rate.

DISCUSSION

The BART-Large-CNN model demonstrates a high level of understanding of clinical dialogue structure, indicated by its performance with and without fine-tuning. Despite some instances of high recall, there is variability in the model's performance, particularly in achieving consistent correctness, suggesting room for refinement. The model's recall ability varies across different information categories.

CONCLUSION

The study provides evidence towards the potential of AI-assisted tools in assisting clinical documentation. Future work is suggested on expanding the research scope with additional language models and hybrid approaches, and comparative analysis to measure documentation burden and human factors.

摘要

目的

我们展示了一个概念验证数字抄写员系统，作为基于急诊科（ED）会诊电话的临床对话总结流程，以支持临床文档记录，并报告其性能。

材料与方法

我们使用四个预训练的大语言模型通过零样本和微调方法建立数字抄写员系统：T5-small、T5-base、PEGASUS-PubMed和BART-Large-CNN。我们的数据集包括急诊科临床医生之间的100次转诊对话和病历。我们报告ROUGE-1、ROUGE-2和ROUGE-L以比较模型性能。此外，我们对转录本进行注释以评估生成总结的质量。

结果

经过微调的BART-Large-CNN模型在总结任务中表现出更高的性能，具有最高的ROUGE分数（F1ROUGE-1=0.49，F1ROUGE-2=0.23，F1ROUGE-L=0.35）。相比之下，PEGASUS-PubMed明显落后（F1ROUGE-1=0.28，F1ROUGE-2=0.11，F1ROUGE-L=0.22）。使用零样本方法时，BART-Large-CNN的性能下降超过50%。注释显示，BART-Large-CNN在识别关键信息方面的召回率为71.4%，准确率为67.7%。

讨论

BART-Large-CNN模型在有无微调的情况下的性能表明，它对临床对话结构有较高的理解水平。尽管有一些高召回率的情况，但模型的性能存在差异，特别是在实现一致的正确性方面，这表明还有改进的空间。模型的召回能力在不同信息类别中有所不同。

结论

该研究为人工智能辅助工具在协助临床文档记录方面的潜力提供了证据。建议未来的工作是通过增加语言模型和混合方法来扩大研究范围，并进行比较分析以衡量文档负担和人为因素。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

数字抄写员的评估：急诊科会诊电话的对话总结

Evaluation of a Digital Scribe: Conversation Summarization for Emergency Department Consultation Calls.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

数字抄写员的评估：急诊科会诊电话的对话总结

Evaluation of a Digital Scribe: Conversation Summarization for Emergency Department Consultation Calls.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献