Department of Communication Sciences & Disorders, Long Island University, Brooklyn, NY, United States.
Department of Mathematics and Statistics, Utah State University, Logan, UT, United States.
J Med Internet Res. 2022 Oct 20;24(10):e40567. doi: 10.2196/40567.
Most individuals with Parkinson disease (PD) experience a degradation in their speech intelligibility. Research on the use of automatic speech recognition (ASR) to assess intelligibility is still sparse, especially when trying to replicate communication challenges in real-life conditions (ie, noisy backgrounds). Developing technologies to automatically measure intelligibility in noise can ultimately assist patients in self-managing their voice changes due to the disease.
The goal of this study was to pilot-test and validate the use of a customized web-based app to assess speech intelligibility in noise in individuals with dysarthria associated with PD.
In total, 20 individuals with dysarthria associated with PD and 20 healthy controls (HCs) recorded a set of sentences using their phones. The Google Cloud ASR API was used to automatically transcribe the speakers' sentences. An algorithm was created to embed speakers' sentences in +6-dB signal-to-noise multitalker babble. Results from ASR performance were compared to those from 30 listeners who orthographically transcribed the same set of sentences. Data were reduced into a single event, defined as a success if the artificial intelligence (AI) system transcribed a random speaker or sentence as well or better than the average of 3 randomly chosen human listeners. These data were further analyzed by logistic regression to assess whether AI success differed by speaker group (HCs or speakers with dysarthria) or was affected by sentence length. A discriminant analysis was conducted on the human listener data and AI transcriber data independently to compare the ability of each data set to discriminate between HCs and speakers with dysarthria.
The data analysis indicated a 0.8 probability (95% CI 0.65-0.91) that AI performance would be as good or better than the average human listener. AI transcriber success probability was not found to be dependent on speaker group. AI transcriber success was found to decrease with sentence length, losing an estimated 0.03 probability of transcribing as well as the average human listener for each word increase in sentence length. The AI transcriber data were found to offer the same discrimination of speakers into categories (HCs and speakers with dysarthria) as the human listener data.
ASR has the potential to assess intelligibility in noise in speakers with dysarthria associated with PD. Our results hold promise for the use of AI with this clinical population, although a full range of speech severity needs to be evaluated in future work, as well as the effect of different speaking tasks on ASR.
大多数帕金森病(PD)患者的言语可懂度都会下降。关于使用自动语音识别(ASR)评估可懂度的研究仍然很少,尤其是在尝试复制现实生活条件下的交流挑战时(例如嘈杂的背景)。开发自动测量噪声中可懂度的技术最终可以帮助患者自我管理因疾病导致的语音变化。
本研究旨在初步测试和验证使用定制的基于网络的应用程序评估与 PD 相关的构音障碍个体在噪声中的言语可懂度。
共有 20 名与 PD 相关的构音障碍患者和 20 名健康对照者(HCs)使用手机录制了一组句子。使用 Google Cloud ASR API 自动转录说话者的句子。创建了一个算法将说话者的句子嵌入到+6dB 信噪比分集说话者的背景噪声中。将 ASR 性能的结果与 30 名听写者的结果进行比较,这些听写者根据拼写法转录了相同的句子集。数据被简化为一个事件,如果人工智能(AI)系统将一个随机说话者或句子转录为与 3 个随机选择的人类听者的平均水平一样或更好,则定义为成功。使用逻辑回归进一步分析这些数据,以评估 AI 成功是否因说话者群体(HCs 或构音障碍者)而异,或者是否受句子长度的影响。对人类听者数据和 AI 转录器数据分别进行判别分析,以比较每个数据集区分 HCs 和构音障碍者的能力。
数据分析表明,AI 性能与平均人类听者一样或更好的概率为 0.8(95%CI 0.65-0.91)。AI 转录器的成功概率与说话者群体无关。AI 转录器的成功概率随着句子长度的增加而降低,每增加一个单词,估计会损失 0.03 的转录成功率和平均人类听者的转录成功率。AI 转录器数据在将说话者分为类别(HCs 和构音障碍者)方面与人类听者数据提供了相同的区分能力。
ASR 有可能评估与 PD 相关的构音障碍患者在噪声中的可懂度。我们的结果为在该临床人群中使用 AI 提供了希望,尽管在未来的工作中需要评估更广泛的语音严重程度,以及不同的说话任务对 ASR 的影响。