Department of Neurology, Mayo Clinic, Rochester, MN.
Department of Radiology, Mayo Clinic, Rochester, MN.
J Speech Lang Hear Res. 2024 Sep 12;67(9):2964-2976. doi: 10.1044/2024_JSLHR-24-00049. Epub 2024 Aug 6.
Transcribing disordered speech can be useful when diagnosing motor speech disorders such as primary progressive apraxia of speech (PPAOS), who have sound additions, deletions, and substitutions, or distortions and/or slow, segmented speech. Since transcribing speech can be a laborious process and requires an experienced listener, using automatic speech recognition (ASR) systems for diagnosis and treatment monitoring is appealing. This study evaluated the efficacy of a readily available ASR system (wav2vec 2.0) in transcribing speech of PPAOS patients to determine if the word error rate (WER) output by the ASR can differentiate between healthy speech and PPAOS and/or among its subtypes, whether WER correlates with AOS severity, and how the ASR's errors compare to those noted in manual transcriptions.
Forty-five patients with PPAOS and 22 healthy controls were recorded repeating 13 words, 3 times each, which were transcribed manually and using wav2vec 2.0. The WER and phonetic and prosodic speech errors were compared between groups, and ASR results were compared against manual transcriptions.
Mean overall WER was 0.88 for patients and 0.33 for controls. WER significantly correlated with AOS severity and accurately distinguished between patients and controls but not between AOS subtypes. The phonetic and prosodic errors from the ASR transcriptions were also unable to distinguish between subtypes, whereas errors calculated from human transcriptions were. There was poor agreement in the number of phonetic and prosodic errors between the ASR and human transcriptions.
This study demonstrates that ASR can be useful in differentiating healthy from disordered speech and evaluating PPAOS severity but does not distinguish PPAOS subtypes. ASR transcriptions showed weak agreement with human transcriptions; thus, ASR may be a useful tool for the transcription of speech in PPAOS, but the research questions posed must be carefully considered within the context of its limitations.
当诊断运动性言语障碍(如原发性进行性构音障碍,PPAOS)时,转写言语障碍可能非常有用,因为 PPAOS 患者的言语可能会出现声音添加、删除、替换,或扭曲和/或语速缓慢、分段的现象。由于转写言语可能是一个繁琐的过程,并且需要有经验的听众,因此使用自动语音识别(ASR)系统进行诊断和治疗监测是很有吸引力的。本研究评估了一种现成的 ASR 系统(wav2vec 2.0)在转写 PPAOS 患者言语中的功效,以确定 ASR 的单词错误率(WER)输出是否可以区分健康言语和 PPAOS 以及/或其亚型,WER 是否与 AOS 严重程度相关,以及 ASR 的错误与手动转写中的错误有何不同。
记录了 45 名 PPAOS 患者和 22 名健康对照者重复 13 个单词 3 次的语音,这些语音由人工和 wav2vec 2.0 进行转写。比较了组间的 WER 和语音及韵律言语错误,并将 ASR 结果与手动转写进行了比较。
患者的平均总 WER 为 0.88,对照组为 0.33。WER 与 AOS 严重程度显著相关,能够准确区分患者和对照组,但不能区分 AOS 亚型。ASR 转写的语音和韵律错误也无法区分亚型,而人工转写的错误可以。ASR 与人工转写的语音和韵律错误数量之间的一致性较差。
本研究表明,ASR 可用于区分健康和障碍性言语,评估 PPAOS 严重程度,但不能区分 PPAOS 亚型。ASR 转写与人工转写的一致性较弱;因此,ASR 可能是 PPAOS 语音转录的有用工具,但必须在其局限性的背景下仔细考虑提出的研究问题。