[基于调幅和调频的人工耳蜗语音编码策略]

[Speech coding strategy based on amplitude and frequency modulation for cochlear implants].

作者信息

Lin Hongyun, Wang Weidong

机构信息

Medical Engineering Support Center of Chinese, PLA General Hospital, Beijing 100853, China.

出版信息

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2011 Apr;28(2):228-32.

PMID:21604474

Abstract

To enhance speech recognition in noise, as well as tone recognition, we presented a new kind of speech coding strategy, called one-octave wavelet transform zero-crossing stimulation (WTZS), for cochlear implants based on amplitude and frequency modulation. We selected 15 volunteers with normal hearing ability to carry out hearing simulation experiments by picking up the amplitude (amplitude modulation, AM), zero-crossings (frequency modulation, FM) and gradient parameters from processed speech signal in the domain of one-octave wavelet transform to synthesize the stimulating pulstile series. The experimental results demonstrated that the phonetic recognition in quiet surroundings with amplitude modulation only strategy (CIS) is similar to that of amplitude and frequency modulations strategies (FAME and WTZS), while the tone perception of CIS is inferior to that of FAME and WTZS strategies. However, in noisy environment, the phonetic recognition, tone perception, as well as sentence recognition of WTZS strategy are better than those of CIS and FAME strategies. WTZS strategy, utilizing amplitude (AM), zero-crossings (FM) and gradient parameters to synthesize stimulus, can enhance the phonetic and tonal language recognition in noise environment effectively, and could be used in cochlear implant system for speech processor design after arithmetic optimization.

摘要

为了增强噪声环境下的语音识别以及音调识别能力，我们基于幅度和频率调制，提出了一种新型的语音编码策略，称为单倍频程小波变换过零刺激（WTZS），用于人工耳蜗。我们挑选了15名听力正常的志愿者，通过在单倍频程小波变换域中提取处理后的语音信号的幅度（幅度调制，AM）、过零点（频率调制，FM）和梯度参数来合成刺激脉冲序列，从而进行听力模拟实验。实验结果表明，仅采用幅度调制策略（CIS）在安静环境中的语音识别与幅度和频率调制策略（FAME和WTZS）相似，而CIS的音调感知能力不如FAME和WTZS策略。然而，在噪声环境中，WTZS策略的语音识别、音调感知以及句子识别能力均优于CIS和FAME策略。WTZS策略利用幅度（AM）、过零点（FM）和梯度参数来合成刺激，能够有效增强噪声环境下的语音和音调语言识别能力，并且在经过算法优化后可用于人工耳蜗系统的语音处理器设计。