Kodish-Wachs Jodi, Agassi Emin, Kenny Patrick, Overhage J Marc
Cerner Corporation, Malvern, PA.
AMIA Annu Symp Proc. 2018 Dec 5;2018:683-689. eCollection 2018.
Conversations especially between a clinician and a patient are important sources of data to support clinical care. To date, clinicians act as the sensor to capture these data and record them in the medical record. Automatic speech recognition (ASR) engines have advanced to support continuous speech, to work independently of speaker and deliver continuously improving performance. Near human levels of performance have been reported for several ASR engines. We undertook a systematic comparison of selected ASRs for clinical conversational speech. Using audio recorded from unscripted clinical scenarios using two microphones, we evaluated eight ASR engines using word error rate (WER) and the precision, recall and F1 scores for concept extraction. We found a wide range of word errors across the ASR engines, with values ranging from 65% to 34%, all falling short of the rates achieved for other conversational speech. Recall for health concepts also ranged from 22% to 74%. Concept recall rates match or exceed expectations given measured word error rates suggesting that vocabulary is not the dominant issue.
对话,尤其是临床医生与患者之间的对话,是支持临床护理的数据的重要来源。迄今为止,临床医生充当捕捉这些数据并将其记录在病历中的传感器。自动语音识别(ASR)引擎已经取得进展,以支持连续语音,独立于说话者工作并不断提高性能。已有报道称,几款ASR引擎的性能接近人类水平。我们对用于临床对话语音的选定ASR进行了系统比较。我们使用两个麦克风从无脚本临床场景中录制的音频,通过单词错误率(WER)以及概念提取的精确率、召回率和F1分数对八个ASR引擎进行了评估。我们发现,各ASR引擎的单词错误率差异很大,数值范围从65%到34%,均低于其他对话语音所达到的比率。健康概念的召回率也在22%到74%之间。考虑到测得的单词错误率,概念召回率符合或超过预期,这表明词汇不是主要问题。