Merler Michele, Agurto Carla, Peller Julian, Roitberg Esteban, Taitz Alan, Trevisan Marcos A, Navar Indu, Berry James D, Fraenkel Ernest, Ostrow Lyle W, Cecchi Guillermo A, Norel Raquel
IBM Research, Yorktown Heights, NY, USA.
EverythingALS, Peter Cohen Foundation, Los Altos, CA, USA.
NPJ Digit Med. 2025 May 8;8(1):260. doi: 10.1038/s41746-025-01654-7.
Speech dysarthria is a key symptom of neurological conditions like ALS, yet existing AI models designed to analyze it from audio signal rely on handcrafted features with limited inference performance. Deep learning approaches improve accuracy but lack interpretability. We propose an attention-based deep learning AI model to assess dysarthria severity based on listener effort ratings. Using 2,102 recordings from 125 participants, rated by three speech-language pathologists on a 100-point scale, we trained models directly from recordings collected remotely. Our best model achieved R of 0.92 and RMSE of 6.78. Attention-based interpretability identified key phonemes, such as vowel sounds influenced by 'r' (e.g., "car," "more"), and isolated inspiration sounds as markers of speech deterioration. This model enhances precision in dysarthria assessment while maintaining clinical interpretability. By improving sensitivity to subtle speech changes, it offers a valuable tool for research and patient care in ALS and other neurological disorders.
言语构音障碍是肌萎缩侧索硬化症(ALS)等神经系统疾病的关键症状,但现有的旨在从音频信号分析该症状的人工智能模型依赖于手工制作的特征,推理性能有限。深度学习方法提高了准确性,但缺乏可解释性。我们提出了一种基于注意力的深度学习人工智能模型,以根据听众的努力程度评分来评估构音障碍的严重程度。我们使用了125名参与者的2102份录音,由三名言语语言病理学家以100分制进行评分,我们直接从远程收集的录音中训练模型。我们最好的模型的R值为0.92,均方根误差为6.78。基于注意力的可解释性确定了关键音素,如受“r”影响的元音(如“car”、“more”),以及孤立的吸气音作为言语恶化的标志。该模型提高了构音障碍评估的精度,同时保持了临床可解释性。通过提高对细微语音变化的敏感性,它为ALS和其他神经系统疾病的研究及患者护理提供了一个有价值的工具。