Department of Information Engineering, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy.
Sensors (Basel). 2021 Sep 27;21(19):6460. doi: 10.3390/s21196460.
Within the field of Automatic Speech Recognition (ASR) systems, facing impaired speech is a big challenge because standard approaches are ineffective in the presence of dysarthria. The first aim of our work is to confirm the effectiveness of a new speech analysis technique for speakers with dysarthria. This new approach exploits the fine-tuning of the size and shift parameters of the spectral analysis window used to compute the initial short-time Fourier transform, to improve the performance of a speaker-dependent ASR system. The second aim is to define if there exists a correlation among the speaker's voice features and the optimal window and shift parameters that minimises the error of an ASR system, for that specific speaker. For our experiments, we used both impaired and unimpaired Italian speech. Specifically, we used 30 speakers with dysarthria from the IDEA database and 10 professional speakers from the CLIPS database. Both databases are freely available. The results confirm that, if a standard ASR system performs poorly with a speaker with dysarthria, it can be improved by using the new speech analysis. Otherwise, the new approach is ineffective in cases of unimpaired and low impaired speech. Furthermore, there exists a correlation between some speaker's voice features and their optimal parameters.
在自动语音识别 (ASR) 系统领域,面对受损语音是一个巨大的挑战,因为标准方法在存在构音障碍时效果不佳。我们工作的首要目标是确认一种新的语音分析技术对于构音障碍者的有效性。这种新方法利用了频谱分析窗口大小和移动参数的微调,该窗口用于计算初始短时傅里叶变换,以提高特定于说话者的 ASR 系统的性能。第二个目标是确定在特定说话者的情况下,说话者的语音特征和最小化 ASR 系统错误的最佳窗口和移动参数之间是否存在相关性。对于我们的实验,我们同时使用了受损和未受损的意大利语语音。具体来说,我们使用了来自 IDEA 数据库的 30 名构音障碍者和来自 CLIPS 数据库的 10 名专业演讲者。这两个数据库都是免费提供的。结果证实,如果标准的 ASR 系统在构音障碍者的表现不佳,可以通过使用新的语音分析来改进。否则,在未受损和轻度受损语音的情况下,新方法无效。此外,一些说话者的语音特征与其最佳参数之间存在相关性。