Hokuyukai Neurological Hospital, 4-30, 2jo, 2cho-me, Nijuyonken, Nishi-ku, Sapporo, 063-0802, Japan.
Department of Neurology, Faculty of Medicine and Graduate School of Medicine, Hokkaido University, Kita 15, Nishi 7, Kita-ku, Sapporo, 060-8638, Japan.
J Neurol. 2024 Feb;271(2):1004-1012. doi: 10.1007/s00415-023-12091-5. Epub 2023 Nov 21.
Assessing dysarthria features in patients with neurodegenerative diseases helps diagnose underlying pathologies. Although deep neural network (DNN) techniques have been widely adopted in various audio processing tasks, few studies have tested whether DNNs can help differentiate neurodegenerative diseases using patients' speech data. This study evaluated whether a DNN model using a transformer architecture could differentiate patients with Parkinson's disease (PD) from patients with spinocerebellar degeneration (SCD) using speech data.
Speech data were obtained from 251 and 101 patients with PD and SCD, respectively, while they read a passage. We fine-tuned a pre-trained DNN model using log-mel spectrograms generated from speech data. The DNN model was trained to predict whether the input spectrogram was generated from patients with PD or SCD. We used fivefold cross-validation to evaluate the predictive performance using the area under the receiver operating characteristic curve (AUC) and accuracy, sensitivity, and specificity.
Average ± standard deviation of the AUC, accuracy, sensitivity, and specificity of the trained model for the fivefold cross-validation were 0.93 ± 0.04, 0.87 ± 0.03, 0.83 ± 0.05, and 0.89 ± 0.05, respectively.
The DNN model can differentiate speech data of patients with PD from that of patients with SCD with relatively high accuracy and AUC. The proposed method can be used as a non-invasive, easy-to-perform screening method to differentiate PD from SCD using patient speech and is expected to be applied to telemedicine.
评估神经退行性疾病患者的构音障碍特征有助于诊断潜在的病理。虽然深度神经网络(DNN)技术已广泛应用于各种音频处理任务,但很少有研究测试 DNN 是否可以使用患者的语音数据来区分神经退行性疾病。本研究评估了一种使用变压器架构的 DNN 模型是否可以使用语音数据区分帕金森病(PD)患者和脊髓小脑变性(SCD)患者。
分别从 251 名 PD 患者和 101 名 SCD 患者中获取语音数据,患者在阅读文章时采集语音数据。我们使用从语音数据生成的对数梅尔频谱图微调了预先训练的 DNN 模型。DNN 模型经过训练,可预测输入频谱图是否由 PD 或 SCD 患者生成。我们使用五折交叉验证来评估使用接受者操作特征曲线下的面积(AUC)和准确性、敏感性和特异性的预测性能。
经过五折交叉验证,训练后的模型的 AUC、准确性、敏感性和特异性的平均值±标准偏差分别为 0.93±0.04、0.87±0.03、0.83±0.05 和 0.89±0.05。
DNN 模型可以区分 PD 患者和 SCD 患者的语音数据,具有较高的准确性和 AUC。该方法可作为一种非侵入性、易于执行的筛查方法,使用患者的语音来区分 PD 和 SCD,预计将应用于远程医疗。