Zheng Zhong, Li Keyi, Feng Gang, Guo Yang, Li Yinan, Xiao Lili, Liu Chengqi, He Shouhuan, Zhang Zhen, Qian Di, Feng Yanmei
Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.
Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China.
Front Neurosci. 2021 Dec 2;15:744959. doi: 10.3389/fnins.2021.744959. eCollection 2021.
Mandarin-speaking users of cochlear implants (CI) perform poorer than their English counterpart. This may be because present CI speech coding schemes are largely based on English. This study aims to evaluate the relative contributions of temporal envelope (E) cues to Mandarin phoneme (including vowel, and consonant) and lexical tone recognition to provide information for speech coding schemes specific to Mandarin. Eleven normal hearing subjects were studied using acoustic temporal E cues that were extracted from 30 continuous frequency bands between 80 and 7,562 Hz using the Hilbert transform and divided into five frequency regions. Percent-correct recognition scores were obtained with acoustic E cues presented in three, four, and five frequency regions and their relative weights calculated using the least-square approach. For stimuli with three, four, and five frequency regions, percent-correct scores for vowel recognition using E cues were 50.43-84.82%, 76.27-95.24%, and 96.58%, respectively; for consonant recognition 35.49-63.77%, 67.75-78.87%, and 87.87%; for lexical tone recognition 60.80-97.15%, 73.16-96.87%, and 96.73%. For frequency region 1 to frequency region 5, the mean weights in vowel recognition were 0.17, 0.31, 0.22, 0.18, and 0.12, respectively; in consonant recognition 0.10, 0.16, 0.18, 0.23, and 0.33; in lexical tone recognition 0.38, 0.18, 0.14, 0.16, and 0.14. Regions that contributed most for vowel recognition was Region 2 (502-1,022 Hz) that contains first formant (1) information; Region 5 (3,856-7,562 Hz) contributed most to consonant recognition; Region 1 (80-502 Hz) that contains fundamental frequency (F0) information contributed most to lexical tone recognition.
使用人工耳蜗(CI)的讲普通话的用户表现比讲英语的用户差。这可能是因为目前的人工耳蜗语音编码方案很大程度上基于英语。本研究旨在评估时间包络(E)线索对普通话音素(包括元音和辅音)及声调识别的相对贡献,为特定于普通话的语音编码方案提供信息。使用希尔伯特变换从80至7562赫兹的30个连续频带中提取声学时间E线索,并将其分为五个频率区域,对11名听力正常的受试者进行了研究。通过在三个、四个和五个频率区域呈现声学E线索获得正确识别率得分,并使用最小二乘法计算其相对权重。对于具有三个、四个和五个频率区域的刺激,使用E线索进行元音识别的正确识别率分别为50.43 - 84.82%、76.27 - 95.24%和96.58%;辅音识别为35.49 - 63.77%、67.75 - 78.87%和87.87%;声调识别为60.80 - 97.15%、73.16 - 96.87%和96.73%。对于频率区域1至频率区域5,元音识别中的平均权重分别为0.17、0.31、0.22、0.18和0.12;辅音识别中为0.10、0.16、0.18、0.23和0.33;声调识别中为0.38、0.18、0.14、0.16和0.14。对元音识别贡献最大的区域是包含第一共振峰(F1)信息的区域2(502 - 1022赫兹);区域5(3856 - 7562赫兹)对辅音识别贡献最大;包含基频(F0)信息的区域1(80 - 502赫兹)对声调识别贡献最大。