Rohlfing Matthew L, Buckley Daniel P, Piraquive Jacquelyn, Stepp Cara E, Tracy Lauren F
Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center Boston University School of Medicine, Boston, Massachusetts, U.S.A.
Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts, U.S.A.
Laryngoscope. 2021 Jul;131(7):1599-1607. doi: 10.1002/lary.29082. Epub 2020 Sep 19.
OBJECTIVES/HYPOTHESIS: Interaction with voice recognition systems, such as Siri™ and Alexa™, is an increasingly important part of everyday life. Patients with voice disorders may have difficulty with this technology, leading to frustration and reduction in quality of life. This study evaluates the ability of common voice recognition systems to transcribe dysphonic voices.
Retrospective evaluation of "Rainbow Passage" voice samples from patients with and without voice disorders.
Participants with (n = 30) and without (n = 23) voice disorders were recorded reading the "Rainbow Passage". Recordings were played at standardized intensity and distance-to-dictation programs on Apple iPhone 6S™, Apple iPhone 11 Pro™, and Google Voice™. Word recognition scores were calculated as the proportion of correctly transcribed words. Word recognition scores were compared to auditory-perceptual and acoustic measures.
Mean word recognition scores for participants with and without voice disorders were, respectively, 68.6% and 91.9% for Apple iPhone 6S™ (P < .001), 71.2% and 93.7% for Apple iPhone 11 Pro™ (P < .001), and 68.7% and 93.8% for Google Voice™ (P < .001). There were strong, approximately linear associations between CAPE-V ratings of overall severity of dysphonia and word recognition score, with correlation coefficients (R ) of 0.609 (iPhone 6S™), 0.670 (iPhone 11 Pro™), and 0.619 (Google Voice™). These relationships persisted when controlling for diagnosis, age, gender, fundamental frequency, and speech rate (P < .001 for all systems).
Common voice recognition systems function well with nondysphonic voices but are poor at accurately transcribing dysphonic voices. There was a strong negative correlation with word recognition scores and perceptual voice evaluation. As our society increasingly interfaces with automated voice recognition technology, the needs of patients with voice disorders should be considered.
4 Laryngoscope, 131:1599-1607, 2021.
目的/假设:与语音识别系统(如Siri™和Alexa™)的交互在日常生活中变得越来越重要。语音障碍患者在使用这项技术时可能会遇到困难,从而导致沮丧情绪并降低生活质量。本研究评估了常见语音识别系统转录嗓音障碍患者语音的能力。
对有和没有语音障碍患者的“彩虹段落”语音样本进行回顾性评估。
记录了有语音障碍的参与者(n = 30)和无语音障碍的参与者(n = 23)朗读“彩虹段落”的情况。录音在苹果iPhone 6S™、苹果iPhone 11 Pro™和谷歌语音™上以标准化强度和距离听写程序播放。单词识别分数以正确转录单词的比例计算。将单词识别分数与听觉感知和声学测量结果进行比较。
对于苹果iPhone 6S™,有语音障碍和无语音障碍参与者的平均单词识别分数分别为68.6%和91.9%(P <.001);对于苹果iPhone 11 Pro™,分别为71.2%和93.7%(P <.001);对于谷歌语音™,分别为68.7%和93.8%(P <.001)。嗓音障碍总体严重程度的CAPE-V评分与单词识别分数之间存在强且近似线性的关联,相关系数(R)分别为0.609(iPhone 6S™)、0.670(iPhone 11 Pro™)和0.619(谷歌语音™)。在控制诊断、年龄、性别、基频和语速后,这些关系依然存在(所有系统的P均<.001)。
常见语音识别系统对无嗓音障碍的语音功能良好,但在准确转录嗓音障碍语音方面表现不佳。单词识别分数与感知语音评估之间存在很强的负相关性。随着我们的社会越来越多地与自动语音识别技术交互,应考虑语音障碍患者的需求。
4 《喉镜》,131:1599 - 1607,2021年。