Department of Informatics, Donald Bren School of Informatics and Computer Science, University of California, Irvine, Irvine, California, USA.
School of Medicine, University of California, Irvine, Irvine, California, USA.
J Am Med Inform Assoc. 2023 Mar 16;30(4):703-711. doi: 10.1093/jamia/ocad001.
OBJECTIVES: Ambient clinical documentation technology uses automatic speech recognition (ASR) and natural language processing (NLP) to turn patient-clinician conversations into clinical documentation. It is a promising approach to reducing clinician burden and improving documentation quality. However, the performance of current-generation ASR remains inadequately validated. In this study, we investigated the impact of non-lexical conversational sounds (NLCS) on ASR performance. NLCS, such as Mm-hm and Uh-uh, are commonly used to convey important information in clinical conversations, for example, Mm-hm as a "yes" response from the patient to the clinician question "are you allergic to antibiotics?" MATERIALS AND METHODS: In this study, we evaluated 2 contemporary ASR engines, Google Speech-to-Text Clinical Conversation ("Google ASR"), and Amazon Transcribe Medical ("Amazon ASR"), both of which have their language models specifically tailored to clinical conversations. The empirical data used were from 36 primary care encounters. We conducted a series of quantitative and qualitative analyses to examine the word error rate (WER) and the potential impact of misrecognized NLCS on the quality of clinical documentation. RESULTS: Out of a total of 135 647 spoken words contained in the evaluation data, 3284 (2.4%) were NLCS. Among these NLCS, 76 (0.06% of total words, 2.3% of all NLCS) were used to convey clinically relevant information. The overall WER, of all spoken words, was 11.8% for Google ASR and 12.8% for Amazon ASR. However, both ASR engines demonstrated poor performance in recognizing NLCS: the WERs across frequently used NLCS were 40.8% (Google) and 57.2% (Amazon), respectively; and among the NLCS that conveyed clinically relevant information, 94.7% and 98.7%, respectively. DISCUSSION AND CONCLUSION: Current ASR solutions are not capable of properly recognizing NLCS, particularly those that convey clinically relevant information. Although the volume of NLCS in our evaluation data was very small (2.4% of the total corpus; and for NLCS that conveyed clinically relevant information: 0.06%), incorrect recognition of them could result in inaccuracies in clinical documentation and introduce new patient safety risks.
目的:环境临床文档技术使用自动语音识别(ASR)和自然语言处理(NLP)将医患对话转化为临床文档。这是一种有前途的减轻临床医生负担和提高文档质量的方法。然而,当前一代 ASR 的性能仍未得到充分验证。在这项研究中,我们研究了非词汇对话声音(NLCS)对 ASR 性能的影响。NLCS 如 Mm-hm 和 Uh-uh 常用于在临床对话中传达重要信息,例如,Mm-hm 是患者对临床医生询问“你对抗生素过敏吗?”的问题的“是”的回答。
材料和方法:在这项研究中,我们评估了两个现代 ASR 引擎,Google Speech-to-Text Clinical Conversation(“Google ASR”)和 Amazon Transcribe Medical(“Amazon ASR”),它们的语言模型都是专门为临床对话量身定制的。使用的实证数据来自 36 次初级保健就诊。我们进行了一系列定量和定性分析,以检查单词错误率(WER)和误识别 NLCS 对临床文档质量的潜在影响。
结果:在评估数据中包含的总共 135647 个口语单词中,有 3284 个(2.4%)是 NLCS。在这些 NLCS 中,有 76 个(占总字数的 0.06%,所有 NLCS 的 2.3%)用于传达临床相关信息。所有口语单词的总体 WER 分别为 Google ASR 的 11.8%和 Amazon ASR 的 12.8%。然而,这两个 ASR 引擎在识别 NLCS 方面表现不佳:常用 NLCS 的 WER 分别为 40.8%(Google)和 57.2%(Amazon);在传达临床相关信息的 NLCS 中,分别为 94.7%和 98.7%。
讨论和结论:当前的 ASR 解决方案无法正确识别 NLCS,特别是那些传达临床相关信息的 NLCS。尽管我们评估数据中的 NLCS 量非常小(总语料库的 2.4%;对于传达临床相关信息的 NLCS:0.06%),但错误识别它们可能导致临床文档不准确,并引入新的患者安全风险。
J Am Med Inform Assoc. 2023-3-16
J Am Med Inform Assoc. 2020-5-1
AMIA Annu Symp Proc. 2018-12-5
J Med Internet Res. 2015-11-3
Clin Gastroenterol Hepatol. 2025-2
NPJ Digit Med. 2021-3-26
Patient Educ Couns. 2021-8
Health Informatics J. 2020-12
J Am Med Inform Assoc. 2020-5-1
NPJ Digit Med. 2019-11-22
J Health Commun. 2020
J Am Med Inform Assoc. 2019-12-1
NPJ Digit Med. 2018-10-16
AMIA Annu Symp Proc. 2018-12-5
J Am Med Inform Assoc. 2019-4-1