Institut TELECOM, TELECOM Bretagne, Unit Mixte de Recherche Centre National de la Recherche Scientifique 3192 Laboratoire des Sciences et Techniques de l'Information, de la Communication et de la Connaissance, Brest, France.
IEEE Trans Biomed Eng. 2010 Mar;57(3):572-7. doi: 10.1109/TBME.2009.2031097. Epub 2009 Sep 9.
In this paper, we present a quantitative study on the speech fundamental frequency (F0) of the cochlear implant-like spectrally reduced speech (SRS). The SRS was synthesized from the subband amplitude and frequency modulations (AM and FM) of original clean speech utterances selected from the TI-digits database. The SRS synthesis algorithm was derived from the frequency amplitude modulation encoding (FAME) strategy, proposed by Nie et al., 2005. The normalized mses (NMSEs), calculated between the F0 of the original clean speech and that of the SRSs, were analyzed. The NMSEs analysis of F0 revealed the greater F0 distortion in the AM-based SRS, which is the acoustic simulation of present-day cochlear implants, compared to the FAME-based SRS. This evidence supports the fact that current cochlear implant users have difficulty in the speaker recognition task as reported by Zeng et al., 2005. Further, the analysis results showed that it is better to keep the rapidly varying FM components to reduce the F0 distortion in the FAME-based SRS at low spectral resolution.
在本文中,我们对类耳蜗植入频谱缩减语音(SRS)的语音基频(F0)进行了定量研究。SRS 是通过从 TI 数字数据库中选择的原始清晰语音的子带幅度和频率调制(AM 和 FM)合成的。SRS 合成算法源自 Nie 等人于 2005 年提出的频率幅度调制编码(FAME)策略。分析了原始清晰语音和 SRS 之间 F0 的归一化均方误差(NMSE)。F0 的 NMSE 分析表明,与基于 FAME 的 SRS 相比,基于 AM 的 SRS 的 F0 失真更大,这是当今耳蜗植入物的声学模拟。这一证据支持了 Zeng 等人报告的当前耳蜗植入物使用者在说话人识别任务中存在困难的事实。此外,分析结果表明,在低谱分辨率下,基于 FAME 的 SRS 中最好保留快速变化的 FM 分量以减少 F0 失真。