Kim Kwang Hyeon, Lee Byung-Jou, Koo Hae-Won
Clinical Research Support Center, Inje University Ilsan Paik Hospital, Inje University College of Medicine, Goyang, Korea.
Department of Neurosurgery, Inje University Ilsan Paik Hospital, Inje University College of Medicine, Goyang, Korea.
Korean J Neurotrauma. 2024 Sep 23;20(3):168-179. doi: 10.13004/kjnt.2024.20.e30. eCollection 2024 Sep.
This study investigates the feasibility of employing a pre-trained deep learning wave-to-vec model for speech-to-text analysis in individuals with speech disorders arising from Parkinson's disease (PD).
A publicly available dataset containing speech recordings including the Hoehn and Yahr (H&Y) staging, Movement Disorder Society Unified Parkinson's Disease Rating Scale (UPDRS) Part I, UPDRS Part II scores, and gender information from both healthy controls (HC) and those diagnosed with PD was utilized. Employing the Wav2Vec model, a speech-to-text analysis method was implemented on PD patient data. Tasks conducted included word letter classification, word match probability assessment, and analysis of speech waveform characteristics as provided by the model's output.
For the dataset comprising 20 cases, among individuals with PD, the H&Y score averaged 2.50±0.67, the UPDRS II-part 5 score averaged 0.70±1.00, and the UPDRS III-part 18 score averaged 0.80±0.98. Additionally, the number of words derived from decoded text subsequent to speech recognition was evaluated, resulting in mean values of 299.10±16.79 and 259.80±93.39 for the HC and PD groups, respectively. Furthermore, the calculated degree of agreement for all syllables was based on the speech process. The accuracy for the reading sentences was observed to be 0.31 and 0.10, respectively.
This study aimed to demonstrate the effectiveness of wave-to-vec in enhancing speech-to-text analysis for patients with speech disorders. The findings could pave the way for the development of clinical tools for improved diagnosis, evaluation, and communication support for this population.
本研究探讨使用预训练的深度学习Wave2Vec模型对帕金森病(PD)引起的言语障碍患者进行语音转文本分析的可行性。
利用一个公开可用的数据集,其中包含语音记录,包括霍恩和亚尔(H&Y)分期、运动障碍协会统一帕金森病评定量表(UPDRS)第一部分、UPDRS第二部分得分以及健康对照(HC)和被诊断为PD患者的性别信息。采用Wave2Vec模型,对PD患者数据实施语音转文本分析方法。进行的任务包括单词字母分类、单词匹配概率评估以及对模型输出提供的语音波形特征进行分析。
对于包含20个病例的数据集,在PD患者中,H&Y评分平均为2.50±0.67,UPDRS第二部分第5项得分平均为0.70±1.00,UPDRS第三部分第18项得分平均为0.80±0.98。此外,评估了语音识别后解码文本中的单词数量,HC组和PD组的平均值分别为299.10±16.79和259.80±93.39。此外,基于语音过程计算了所有音节的一致程度。观察到阅读句子的准确率分别为0.31和0.10。
本研究旨在证明Wave2Vec在增强言语障碍患者语音转文本分析方面的有效性。这些发现可为开发临床工具以改善对该人群的诊断、评估和沟通支持铺平道路。