Suppr超能文献

利用小型语音数据库开发针对抑郁症或帕金森病患者的语音识别模型的策略。

Strategy for developing a speech recognition model specialized for patients with depression or Parkinson's disease with small size speech database.

作者信息

Yoon Seojin, Maeng Seri, Kim Ryul, Lee Sangmin

机构信息

Department of Electrical Engineering, Inha University, Incheon, 22212 Republic of Korea.

Department of Psychiatry, Inha University Hospital, Inha University College of Medicine, Incheon, 22332 Republic of Korea.

出版信息

Biomed Eng Lett. 2024 May 23;14(5):1049-1055. doi: 10.1007/s13534-024-00389-w. eCollection 2024 Sep.

Abstract

Most of speech recognition models currently in use have been dealt with speech of normal people. The speech recognition rate for patients with depression or Parkinson's disease (PD) who show differences in speech characteristics compared to normal subjects is lower than that of normal subjects. This study explores the model to enhance accuracy of speech recognition for individuals who have depression or PD, aiming to provide them more accurate service. In this study, considering the speech features of patients with depression or PD, we designed a model with the assumption that understanding the overall meaning and context of speech through the utilization of global information, rather than local information, is more effective in enhancing recognition accuracy. We propose the m-Globalformer, a model based on the Globalformer architecture that combines the squeeze-and-excitation (SE) module with the Transformer. The m-Globalformer enhances the utilization of global information by modifying the base SE module. The model employs pre-training and fine-tuning strategies, considering the limited speech data of the patients. In the initial training phase, a large-scale normal speech dataset was used, followed by fine-tuning the model with a small-scale dataset of depression or PD patients. The m-Globalformer demonstrated superior performance in our experiments, achieved character error rates (CER) of 11.28% for depression and 19.67% for PD.

摘要

目前使用的大多数语音识别模型都针对正常人的语音进行处理。与正常受试者相比,抑郁症或帕金森病(PD)患者的语音特征存在差异,其语音识别率低于正常受试者。本研究探索一种模型,以提高抑郁症或帕金森病患者的语音识别准确率,旨在为他们提供更准确的服务。在本研究中,考虑到抑郁症或帕金森病患者的语音特征,我们设计了一个模型,其假设是通过利用全局信息而非局部信息来理解语音的整体含义和上下文,在提高识别准确率方面更有效。我们提出了m-Globalformer,这是一种基于Globalformer架构的模型,它将挤压激励(SE)模块与Transformer相结合。m-Globalformer通过修改基础SE模块来提高全局信息的利用率。考虑到患者的语音数据有限,该模型采用了预训练和微调策略。在初始训练阶段,使用大规模正常语音数据集,然后用抑郁症或帕金森病患者的小规模数据集对模型进行微调。m-Globalformer在我们的实验中表现出卓越性能,抑郁症患者的字符错误率(CER)为11.28%,帕金森病患者为19.67%。

相似文献

3
[Research on Parkinson's disease recognition algorithm based on sample enhancement].基于样本增强的帕金森病识别算法研究
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Feb 25;41(1):17-25. doi: 10.7507/1001-5515.202304011.

本文引用的文献

2
Speech treatment for Parkinson's disease.帕金森病的言语治疗
Expert Rev Neurother. 2008 Feb;8(2):297-309. doi: 10.1586/14737175.8.2.297.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验