1 Institute of Hearing Technology and Audiology, Jade University of Applied Sciences, Oldenburg, Germany.
2 Cluster of Excellence "Hearing4All", Oldenburg, Germany.
Trends Hear. 2019 Jan-Dec;23:2331216519862982. doi: 10.1177/2331216519862982.
Speech audiometry is an essential part of audiological diagnostics and clinical measurements. Development times of speech recognition tests are rather long, depending on the size of speech corpus and optimization necessity. The aim of this study was to examine whether this development effort could be reduced by using synthetic speech in speech audiometry, especially in a matrix test for speech recognition. For this purpose, the speech material of the German matrix test was replicated using a preselected commercial system to generate the synthetic speech files. In contrast to the conventional matrix test, no level adjustments or optimization tests were performed while producing the synthetic speech material. Evaluation measurements were conducted by presenting both versions of the German matrix test (with natural or synthetic speech), alternately and at three different signal-to-noise ratios, to 48 young, normal-hearing participants. Psychometric functions were fitted to the empirical data. Speech recognition thresholds were 0.5 dB signal-to-noise ratio higher (worse) for the synthetic speech, while slopes were equal for both speech types. Nevertheless, speech recognition scores were comparable with the literature and the threshold difference lay within the same range as recordings of two different natural speakers. Although no optimization was applied, the synthetic-speech signals led to equivalent recognition of the different test lists and word categories. The outcomes of this study indicate that the application of synthetic speech in speech recognition tests could considerably reduce the development costs and evaluation time. This offers the opportunity to increase the speech corpus for speech recognition tests with acceptable effort.
言语测听是听力学诊断和临床测量的重要组成部分。言语识别测试的开发时间相当长,具体取决于言语语料库的大小和优化的必要性。本研究的目的是检验在言语测听中使用合成语音是否可以减少这种开发工作,特别是在言语识别的矩阵测试中。为此,使用预选的商业系统复制了德语矩阵测试的语音材料,以生成合成语音文件。与传统的矩阵测试不同,在生成合成语音材料时,没有进行电平调整或优化测试。通过以三种不同的信噪比交替呈现自然语音和合成语音的两种版本的德语矩阵测试,对 48 名年轻、正常听力的参与者进行了评估测量。将心理测量函数拟合到经验数据上。对于合成语音,言语识别阈值比自然语音高 0.5dB(更差),而两种语音类型的斜率相等。尽管没有进行优化,但对于不同的测试列表和单词类别,合成语音信号的言语识别得分与文献中的结果相当,且阈值差异在两个不同自然语音录音的范围内。尽管没有应用优化,但在言语识别测试中应用合成语音可以显著降低开发成本和评估时间。这为增加具有可接受工作量的言语识别测试的言语语料库提供了机会。