数字听写员的自动语音识别性能：针对患者-临床医生对话进行调整的通用和专用模型之间的性能比较。

Automatic speech recognition performance for digital scribes: a performance comparison between general-purpose and specialized models tuned for patient-clinician conversations.

机构信息

University of California Irvine, Irvine, CA, USA.

University of California San Diego, La Jolla, USA.

出版信息

AMIA Annu Symp Proc. 2023 Apr 29;2022:1072-1080. eCollection 2022.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10148344/

Abstract

One promising solution to address physician data entry needs is through the development of so-called "digital scribes," or tools which aim to automate clinical documentation via automatic speech recognition (ASR) of patient-clinician conversations. Evaluation of specialized ASR models in this domain, useful for understanding feasibility and development opportunities, has been difficult because most models have been under development. Following the commercial release of such models, we report an independent evaluation of four models, two general-purpose, and two for medical conversation with a corpus of 36 primary care conversations. We identify word error rates (WER) of 8.8%-10.5% and word-level diarization error rates (WDER) ranging from 1.8%-13.9%, which are generally lower than previous reports. The findings indicate that, while there is room for improvement, the performance of these specialized models, at least under ideal recording conditions, may be amenable to the development of downstream applications which rely on ASR of patient-clinician conversations.

摘要

解决医师数据录入需求的一个有前景的解决方案是通过开发所谓的“数字抄写员”，或者通过自动语音识别（ASR）自动记录医患对话的工具来实现。由于大多数模型仍在开发中，因此评估专门针对该领域的 ASR 模型（这对于理解可行性和开发机会很有用）一直很困难。在这些模型商业化发布之后，我们报告了对四个模型（两个通用模型和两个用于医疗对话的模型）的独立评估，该评估使用了 36 个初级保健对话的语料库。我们确定了 8.8%-10.5%的单词错误率（WER）和 1.8%-13.9%的单词级对话分割错误率（WDER），这些结果通常低于之前的报告。这些发现表明，虽然仍有改进的空间，但这些专门模型的性能，至少在理想的记录条件下，可能适合开发依赖于医患对话的 ASR 的下游应用程序。

相似文献

1

Automatic speech recognition performance for digital scribes: a performance comparison between general-purpose and specialized models tuned for patient-clinician conversations.

AMIA Annu Symp Proc. 2023 Apr 29;2022:1072-1080. eCollection 2022.

2

"Mm-hm," "Uh-uh": are non-lexical conversational sounds deal breakers for the ambient clinical documentation technology?

J Am Med Inform Assoc. 2023 Mar 16;30(4):703-711. doi: 10.1093/jamia/ocad001.

3

Complete and Resilient Documentation for Operational Medical Environments Leveraging Mobile Hands-free Technology in a Systems Approach: Experimental Study.

JMIR Mhealth Uhealth. 2021 Oct 12;9(10):e32301. doi: 10.2196/32301.

4

A systematic comparison of contemporary automatic speech recognition engines for conversational clinical speech.

AMIA Annu Symp Proc. 2018 Dec 5;2018:683-689. eCollection 2018.

5

How does medical scribes' work inform development of speech-based clinical documentation technologies? A systematic review.

J Am Med Inform Assoc. 2020 May 1;27(5):808-817. doi: 10.1093/jamia/ocaa020.

6

Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy.

Sensors (Basel). 2022 Apr 15;22(8):3027. doi: 10.3390/s22083027.

7

Development and benchmarking of a Korean audio speech recognition model for Clinician-Patient conversations in radiation oncology clinics.

Int J Med Inform. 2023 Aug;176:105112. doi: 10.1016/j.ijmedinf.2023.105112. Epub 2023 Jun 1.

8

The development of an automatic speech recognition model using interview data from long-term care for older adults.

J Am Med Inform Assoc. 2023 Feb 16;30(3):411-417. doi: 10.1093/jamia/ocac241.

9

The digital scribe in clinical practice: a scoping review and research agenda.

NPJ Digit Med. 2021 Mar 26;4(1):57. doi: 10.1038/s41746-021-00432-5.

10

Combining automatic speech recognition with semantic natural language processing in schizophrenia.

Psychiatry Res. 2023 Jul;325:115252. doi: 10.1016/j.psychres.2023.115252. Epub 2023 May 16.

引用本文的文献

1

Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence Scribes for Primary Care: Competitive Analysis.

JMIR Hum Factors. 2025 Jul 23;12:e71434. doi: 10.2196/71434.

2

Inspired Spine Smart Universal Resource Identifier (SURI): An Adaptive AI Framework for Transforming Multilingual Speech Into Structured Medical Reports.

Cureus. 2025 Mar 26;17(3):e81243. doi: 10.7759/cureus.81243. eCollection 2025 Mar.

3

The Utility and Implications of Ambient Scribes in Primary Care.

JMIR AI. 2024 Oct 4;3:e57673. doi: 10.2196/57673.

本文引用的文献

1

The digital scribe in clinical practice: a scoping review and research agenda.

NPJ Digit Med. 2021 Mar 26;4(1):57. doi: 10.1038/s41746-021-00432-5.

2

Automated rating of patient and physician emotion in primary care visits.

Patient Educ Couns. 2021 Aug;104(8):2098-2105. doi: 10.1016/j.pec.2021.01.004. Epub 2021 Jan 7.

3

Assessing the accuracy of automatic speech recognition for psychotherapy.

NPJ Digit Med. 2020 Jun 3;3:82. doi: 10.1038/s41746-020-0285-8. eCollection 2020.

4

How does medical scribes' work inform development of speech-based clinical documentation technologies? A systematic review.

J Am Med Inform Assoc. 2020 May 1;27(5):808-817. doi: 10.1093/jamia/ocaa020.

5

Challenges of developing a digital scribe to reduce clinical documentation burden.

NPJ Digit Med. 2019 Nov 22;2:114. doi: 10.1038/s41746-019-0190-1. eCollection 2019.

6

Detecting conversation topics in primary care office visits from transcripts of patient-provider interactions.

J Am Med Inform Assoc. 2019 Dec 1;26(12):1493-1504. doi: 10.1093/jamia/ocz140.

7

A network model of activities in primary care consultations.

J Am Med Inform Assoc. 2019 Oct 1;26(10):1074-1082. doi: 10.1093/jamia/ocz046.

8

The digital scribe.

NPJ Digit Med. 2018 Oct 16;1:58. doi: 10.1038/s41746-018-0066-9. eCollection 2018.

9

Automatically Charting Symptoms From Patient-Physician Conversations Using Machine Learning.

JAMA Intern Med. 2019 Jun 1;179(6):836-838. doi: 10.1001/jamainternmed.2018.8558.

10

A systematic comparison of contemporary automatic speech recognition engines for conversational clinical speech.

AMIA Annu Symp Proc. 2018 Dec 5;2018:683-689. eCollection 2018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。