Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA.
Department of Communicative Disorders and Sciences, University at Buffalo, NY.
J Speech Lang Hear Res. 2022 Jun 8;65(6):2128-2143. doi: 10.1044/2022_JSLHR-21-00589. Epub 2022 May 27.
There is increasing interest in using automatic speech recognition (ASR) systems to evaluate impairment severity or speech intelligibility in speakers with dysarthria. We assessed the clinical validity of one currently available off-the-shelf (OTS) ASR system (i.e., a Google Cloud ASR API) for indexing sentence-level speech intelligibility and impairment severity in individuals with amyotrophic lateral sclerosis (ALS), and we provided guidance for potential users of such systems in research and clinic.
Using speech samples collected from 52 individuals with ALS and 20 healthy control speakers, we compared word recognition rate (WRR) from the commercially available Google Cloud ASR API (Machine WRR) to clinician-provided judgments of impairment severity, as well as sentence intelligibility (Human WRR). We assessed the internal reliability of Machine and Human WRR by comparing the standard deviation of WRR across sentences to the minimally detectable change (MDC), a clinical benchmark that indicates whether results are within measurement error. We also evaluated Machine and Human WRR diagnostic accuracy for classifying speakers into clinically established categories.
Human WRR achieved better accuracy than Machine WRR when indexing speech severity, and, although related, Human and Machine WRR were not strongly correlated. When the speech signal was mixed with noise (noise-augmented ASR) to reduce a ceiling effect, Machine WRR performance improved. Internal reliability metrics were worse for Machine than Human WRR, particularly for typical and mildly impaired severity groups, although sentence length significantly impacted both Machine and Human WRRs.
Results indicated that the OTS ASR system was inadequate for early detection of speech impairment and grading overall speech severity. While Machine and Human WRR were correlated, ASR should not be used as a one-to-one proxy for transcription speech intelligibility or clinician severity ratings. Overall, findings suggested that the tested OTS ASR system, Google Cloud ASR, has limited utility for grading clinical speech impairment in speakers with ALS.
使用自动语音识别(ASR)系统评估构音障碍患者的损伤严重程度或言语可懂度,这方面的兴趣日益增加。我们评估了一种现成的(OTS)ASR 系统(即,Google Cloud ASR API)在索引肌萎缩侧索硬化(ALS)患者的句子级言语可懂度和损伤严重程度方面的临床有效性,并为此类系统的研究和临床应用提供了指导。
使用从 52 名 ALS 患者和 20 名健康对照者收集的语音样本,我们将商业上可用的 Google Cloud ASR API(机器 WRR)提供的单词识别率(WRR)与临床医生提供的损伤严重程度评估,以及句子可懂度(人工 WRR)进行了比较。我们通过将 WRR 在句子之间的标准差与最小可检测变化(MDC)进行比较,评估了机器和人工 WRR 的内部可靠性,MDC 是一个临床基准,表明结果是否在测量误差范围内。我们还评估了机器和人工 WRR 对将说话者分类为临床既定类别的准确性。
在对语音严重程度进行索引时,人工 WRR 的准确性优于机器 WRR,尽管两者相关,但人工和机器 WRR 相关性不强。当语音信号与噪声混合(增强噪声的 ASR)以降低上限效应时,机器 WRR 的性能有所提高。机器的内部可靠性指标比人工 WRR 差,尤其是对于典型和轻度受损的严重程度组,尽管句子长度对机器和人工 WRR 都有显著影响。
结果表明,OTS ASR 系统不适合早期检测语音损伤和总体语音严重程度分级。尽管机器和人工 WRR 相关,但不应将 ASR 用作转录言语可懂度或临床医生严重程度评分的一对一代理。总体而言,研究结果表明,所测试的 OTS ASR 系统 Google Cloud ASR 对 ALS 患者的临床语音损伤分级的实用性有限。