Luo Xiao, Zhou Le, Adelgais Kathleen, Zhang Zhan
Department of Management Science and Information Systems, Oklahoma State University, Stillwater, OK USA.
School of Medicine, Indiana University, Bloomington, IN USA.
J Healthc Inform Res. 2025 Mar 19;9(3):494-512. doi: 10.1007/s41666-025-00193-w. eCollection 2025 Sep.
This study investigates the potential of advanced automatic speech recognition (ASR) technology for transcribing and recognizing medical information during patient encounters, with the aim of enabling real-time clinical documentation to alleviate clinicians' workload. While ASR holds promise, its effectiveness in noisy and dynamic medical settings, such as emergency medical services (EMS), remains underexplored. To address this, four ASR engines-Google Speech-to-Text Clinical Conversation, OpenAI Speech-to-Text, Amazon Transcribe Medical, and Azure Speech-to-Text-were evaluated using 40 EMS simulation recordings. Transcriptions were analyzed for accuracy across 23 electronic health record (EHR) categories relevant to EMS. Google Speech-to-Text Clinical Conversation showed the best overall performance, excelling in categories such as "mental state" (F1 = 1.0), "allergies" (F1 = 0.912), and "electrolytes" (F1 = 1.0). However, all engines struggled with critical EMS categories like "airway" (F1 = 0.524) and "pupils" (F1 = 0.542). These findings highlight the limitations of current ASR technologies and the need for further advancements to improve accuracy and usability in time-sensitive and high-pressure medical environments.
The online version contains supplementary material available at 10.1007/s41666-025-00193-w.
本研究调查了先进的自动语音识别(ASR)技术在患者会诊期间转录和识别医疗信息的潜力,目的是实现实时临床记录以减轻临床医生的工作量。虽然ASR有前景,但其在嘈杂和动态的医疗环境(如紧急医疗服务(EMS))中的有效性仍未得到充分探索。为解决此问题,使用40份EMS模拟录音对四个ASR引擎——谷歌语音转文本临床对话、OpenAI语音转文本、亚马逊转录医疗和Azure语音转文本进行了评估。对与EMS相关的23个电子健康记录(EHR)类别中的转录准确性进行了分析。谷歌语音转文本临床对话表现出最佳的整体性能,在“精神状态”(F1 = 1.0)、“过敏”(F1 = 0.912)和“电解质”(F1 = 1.0)等类别中表现出色。然而,所有引擎在“气道”(F1 = 0.524)和“瞳孔”(F1 = 0.542)等关键EMS类别上都存在困难。这些发现凸显了当前ASR技术的局限性以及在时间敏感和高压医疗环境中提高准确性和可用性所需的进一步改进。
在线版本包含可在10.1007/s41666-025-00193-w获取的补充材料。