Orozco-Arroyave J R, Hönig F, Arias-Londoño J D, Vargas-Bonilla J F, Daqrouq K, Skodda S, Rusz J, Nöth E
Faculty of Engineering, Universidad de Antioquia, Calle 67 Número 53-108, Medellín 1226, Colombia.
Pattern Recognition Lab, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Martensstraβe 3, Erlangen 91058, Germany.
J Acoust Soc Am. 2016 Jan;139(1):481-500. doi: 10.1121/1.4939739.
The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages.
本研究旨在分析帕金森病(PD)患者的连续语音信号,研究考虑了不同语言(西班牙语、德语和捷克语)的录音。本文提出了一种基于将话语自动分割为有声和无声帧的语音信号特征化方法。无声声音的能量含量使用12个梅尔频率倒谱系数和根据巴克标度缩放的25个频段进行建模。对包括孤立单词、音节/pa/-/ta/-/ka/的快速重复、句子和朗读文本在内的四项语音任务进行了评估。该方法在帕金森病患者和健康对照者语音的自动分类中被证明比传统方法更准确。根据语言和语音任务的不同,准确率范围为85%至99%。还进行了跨语言实验,证实了该方法的稳健性和泛化能力,准确率范围为60%至99%。这项工作为开发用于多语言构音障碍语音信号自动评估的计算机辅助工具向前迈出了一步。