Rowe Hannah P, Gutz Sarah E, Maffei Marc F, Tomanek Katrin, Green Jordan R
MGH Institute of Health Professions, Department of Rehabilitation Sciences, Boston, MA, United States.
Harvard University, Department of Speech and Hearing Bioscience and Technology, Boston, MA, United States.
Front Comput Sci. 2022 Apr;4. doi: 10.3389/fcomp.2022.770210. Epub 2022 Apr 12.
Despite significant advancements in automatic speech recognition (ASR) technology, even the best performing ASR systems are inadequate for speakers with impaired speech. This inadequacy may be, in part, due to the challenges associated with acquiring a sufficiently diverse training sample of disordered speech. Speakers with dysarthria, which refers to a group of divergent speech disorders secondary to neurologic injury, exhibit highly variable speech patterns both within and across individuals. This diversity is currently poorly characterized and, consequently, difficult to adequately represent in disordered speech ASR corpora. In this paper, we consider the variable expressions of dysarthria within the context of established clinical taxonomies (e.g., Darley, Aronson, and Brown dysarthria subtypes). We also briefly consider past and recent efforts to capture this diversity quantitatively using speech analytics. Understanding dysarthria diversity from the clinical perspective and how this diversity may impact ASR performance could aid in (1) optimizing data collection strategies for minimizing bias; (2) ensuring representative ASR training sets; and (3) improving generalization of ASR across users and performance for difficult-to-recognize speakers. Our overarching goal is to facilitate the development of robust ASR systems for dysarthric speech using clinical knowledge.
尽管自动语音识别(ASR)技术取得了重大进展,但即使是性能最佳的ASR系统对于言语受损的说话者来说也不够用。这种不足可能部分归因于获取足够多样的言语障碍训练样本所面临的挑战。构音障碍患者,即继发于神经损伤的一组不同的言语障碍患者,在个体内部和个体之间都表现出高度可变的言语模式。目前这种多样性的特征描述很差,因此难以在言语障碍ASR语料库中充分体现。在本文中,我们在既定的临床分类法(例如,达利、阿隆森和布朗构音障碍亚型)的背景下考虑构音障碍的可变表达。我们还简要回顾了过去和最近使用语音分析定量捕捉这种多样性的努力。从临床角度理解构音障碍的多样性以及这种多样性如何影响ASR性能,有助于(1)优化数据收集策略以最小化偏差;(2)确保具有代表性的ASR训练集;(3)提高ASR在不同用户之间的泛化能力以及对难以识别的说话者的性能。我们的总体目标是利用临床知识促进针对构音障碍言语的强大ASR系统的开发。