School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe.
Aural Analytics Inc., Tempe, AZ.
J Speech Lang Hear Res. 2023 Aug 17;66(8S):3166-3181. doi: 10.1044/2023_JSLHR-22-00282. Epub 2023 Aug 9.
Oral diadochokinesis is a useful task in assessment of speech motor function in the context of neurological disease. Remote collection of speech tasks provides a convenient alternative to in-clinic visits, but scoring these assessments can be a laborious process for clinicians. This work describes Wav2DDK, an automated algorithm for estimating the diadochokinetic (DDK) rate on remotely collected audio from healthy participants and participants with amyotrophic lateral sclerosis (ALS).
Wav2DDK was developed using a corpus of 970 DDK assessments from healthy and ALS speakers where ground truth DDK rates were provided manually by trained annotators. The clinical utility of the algorithm was demonstrated on a corpus of 7,919 assessments collected longitudinally from 26 healthy controls and 82 ALS speakers. Corpora were collected via the participants' own mobile device, and instructions for speech elicitation were provided via a mobile app. DDK rate was estimated by parsing the character transcript from a deep neural network transformer acoustic model trained on healthy and ALS speech.
Algorithm estimated DDK rates are highly accurate, achieving .98 correlation with manual annotation, and an average error of only 0.071 syllables per second. The rate exactly matched ground truth for 83% of files and was within 0.5 syllables per second for 95% of files. Estimated rates achieve a high test-retest reliability ( = .95) and show good correlation with the revised ALS functional rating scale speech subscore ( = .67).
We demonstrate a system for automated DDK estimation that increases efficiency of calculation beyond manual annotation. Thorough analytical and clinical validation demonstrates that the algorithm is not only highly accurate, but also provides a convenient, clinically relevant metric for tracking longitudinal decline in ALS, serving to promote participation and diversity of participants in clinical research.
口腔交替运动是评估神经疾病患者言语运动功能的一项有用任务。远程采集语音任务为临床医生提供了一种方便的替代方法,但对这些评估进行评分可能是一个繁琐的过程。本研究描述了 Wav2DDK,这是一种用于估计健康参与者和肌萎缩侧索硬化症(ALS)患者远程采集音频的交替运动(DDK)率的自动化算法。
Wav2DDK 是使用来自健康和 ALS 说话者的 970 个 DDK 评估语料库开发的,其中由经过培训的注释者手动提供 DDK 率的真实值。该算法的临床实用性在 26 名健康对照者和 82 名 ALS 患者的 7919 次纵向采集语料库中得到了证明。语料库是通过参与者自己的移动设备收集的,语音激发的说明是通过移动应用程序提供的。DDK 率是通过解析从健康和 ALS 语音训练的深度神经网络转换器声学模型的字符转写来估计的。
算法估计的 DDK 率非常准确,与手动注释的相关性达到.98,平均误差仅为每秒 0.071 个音节。对于 83%的文件,估计的速率与真实值完全匹配,对于 95%的文件,估计的速率与真实值相差在 0.5 个音节/秒以内。估计的速率具有较高的测试-重测可靠性(r =.95),与修订后的 ALS 功能评定量表言语子评分呈良好相关性(r =.67)。
我们展示了一种用于自动 DDK 估计的系统,该系统提高了计算效率,超越了手动注释。彻底的分析和临床验证表明,该算法不仅高度准确,而且还提供了一种方便的、与临床相关的指标,用于跟踪 ALS 的纵向下降,从而促进了 ALS 临床研究中参与者的多样性和参与度。