Department of Communication Disorders, California State University, Los Angeles, California, USA.
Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana, USA.
Ear Hear. 2020 Mar/Apr;41(2):300-311. doi: 10.1097/AUD.0000000000000756.
The most commonly employed speech processing strategies in cochlear implants (CIs) only extract and encode amplitude modulation (AM) in a limited number of frequency channels. proposed a novel speech processing strategy that encodes both frequency modulation (FM) and AM to improve CI performance. Using behavioral tests, they reported better speech, speaker, and tone recognition with this novel strategy than with the AM-alone strategy. Here, we used the scalp-recorded human frequency following responses (FFRs) to examine the differences in the neural representation of vocoded speech sounds with AM alone and AM + FM as the spectral and temporal cues were varied. Specifically, we were interested in determining whether the addition of FM to AM improved the neural representation of envelope periodicity (FFRENV) and temporal fine structure (FFRTFS), as reflected in the temporal pattern of the phase-locked neural activity generating the FFR.
FFRs were recorded from 13 normal-hearing, adult listeners in response to the original unprocessed stimulus (a synthetic diphthong /au/ with a 110-Hz fundamental frequency or F0 and a 250-msec duration) and the 2-, 4-, 8- and 16-channel sine vocoded versions of /au/ with AM alone and AM + FM. Temporal waveforms, autocorrelation analyses, fast Fourier Transform, and stimulus-response spectral correlations were used to analyze both the strength and fidelity of the neural representation of envelope periodicity (F0) and TFS (formant structure).
The periodicity strength in the FFRENV decreased more for the AM stimuli than for the relatively resilient AM + FM stimuli as the number of channels was increased. Regardless of the number of channels, a clear spectral peak of FFRENV was consistently observed at the stimulus F0 for all the AM + FM stimuli but not for the AM stimuli. Neural representation as revealed by the spectral correlation of FFRTFS was better for the AM + FM stimuli when compared to the AM stimuli. Neural representation of the time-varying formant-related harmonics as revealed by the spectral correlation was also better for the AM + FM stimuli as compared to the AM stimuli.
These results are consistent with previously reported behavioral results and suggest that the AM + FM processing strategy elicited brainstem neural activity that better preserved periodicity, temporal fine structure, and time-varying spectral information than the AM processing strategy. The relatively more robust neural representation of AM + FM stimuli observed here likely contributes to the superior performance on speech, speaker, and tone recognition with the AM + FM processing strategy. Taken together, these results suggest that neural information preserved in the FFR may be used to evaluate signal processing strategies considered for CIs.
在人工耳蜗(CI)中,最常用的语音处理策略仅在有限数量的频率通道中提取和编码幅度调制(AM)。提出了一种新的语音处理策略,该策略同时对频率调制(FM)和 AM 进行编码,以提高 CI 的性能。通过行为测试,他们报告说,与仅 AM 策略相比,这种新策略可以更好地识别语音、说话者和音调。在这里,我们使用头皮记录的人类频率跟随反应(FFR)来检查单独使用 AM 和 AM+FM 作为频谱和时间线索时,编码语音声音的神经表示的差异。具体来说,我们感兴趣的是确定 FM 与 AM 的结合是否改善了包络周期性(FFRENV)和时间精细结构(FFRTFS)的神经表示,这反映在生成 FFR 的锁相神经活动的时间模式中。
FFR 是从 13 名正常听力的成年听众中记录的,他们对原始未处理的刺激(具有 110Hz 基频或 F0 和 250ms 持续时间的合成双元音 /au/)和 2、4、8 和 16 通道正弦语音编码的 /au/ 进行了响应,这些语音编码具有 AM 单独和 AM+FM。使用时间波形、自相关分析、快速傅里叶变换和刺激-反应谱相关来分析包络周期性(F0)和 TFS(共振结构)的神经表示的强度和保真度。
随着通道数量的增加,AM 刺激的 FFRENV 强度比相对有弹性的 AM+FM 刺激下降得更多。无论通道数量如何,所有 AM+FM 刺激的 FFRENV 都在刺激 F0 处始终观察到清晰的谱峰,但 AM 刺激则没有。与 AM 刺激相比,FFRTFS 的谱相关揭示的神经表示更好。与 AM 刺激相比,FFRTFS 的谱相关揭示的时间变化的与共振有关的谐波的神经表示也更好。
这些结果与先前报道的行为结果一致,表明 AM+FM 处理策略引起的脑干神经活动比 AM 处理策略更好地保留了周期性、时间精细结构和时变谱信息。这里观察到的 AM+FM 刺激的相对更稳健的神经表示可能有助于 AM+FM 处理策略在语音、说话者和音调识别方面的优异性能。总之,这些结果表明,在 FFR 中保留的神经信息可用于评估人工耳蜗中考虑的信号处理策略。