Wirtzfeld Michael R, Ibrahim Rasha A, Bruce Ian C
Department of Electrical and Computer Engineering, McMaster University, 1280 Main Street West, Hamilton, L8S 4K1, ON, Canada.
J Assoc Res Otolaryngol. 2017 Oct;18(5):687-710. doi: 10.1007/s10162-017-0627-7. Epub 2017 Jul 26.
Perceptual studies of speech intelligibility have shown that slow variations of acoustic envelope (ENV) in a small set of frequency bands provides adequate information for good perceptual performance in quiet, whereas acoustic temporal fine-structure (TFS) cues play a supporting role in background noise. However, the implications for neural coding are prone to misinterpretation because the mean-rate neural representation can contain recovered ENV cues from cochlear filtering of TFS. We investigated ENV recovery and spike-time TFS coding using objective measures of simulated mean-rate and spike-timing neural representations of chimaeric speech, in which either the ENV or the TFS is replaced by another signal. We (a) evaluated the levels of mean-rate and spike-timing neural information for two categories of chimaeric speech, one retaining ENV cues and the other TFS; (b) examined the level of recovered ENV from cochlear filtering of TFS speech; (c) examined and quantified the contribution to recovered ENV from spike-timing cues using a lateral inhibition network (LIN); and (d) constructed linear regression models with objective measures of mean-rate and spike-timing neural cues and subjective phoneme perception scores from normal-hearing listeners. The mean-rate neural cues from the original ENV and recovered ENV partially accounted for perceptual score variability, with additional variability explained by the recovered ENV from the LIN-processed TFS speech. The best model predictions of chimaeric speech intelligibility were found when both the mean-rate and spike-timing neural cues were included, providing further evidence that spike-time coding of TFS cues is important for intelligibility when the speech envelope is degraded.
言语可懂度的感知研究表明,在一小部分频带中,声学包络(ENV)的缓慢变化为安静环境下良好的感知性能提供了足够的信息,而声学时间精细结构(TFS)线索在背景噪声中起辅助作用。然而,对于神经编码的影响容易产生误解,因为平均速率神经表征可能包含从TFS的耳蜗滤波中恢复的ENV线索。我们使用嵌合语音的模拟平均速率和尖峰时间神经表征的客观测量方法,研究了ENV恢复和尖峰时间TFS编码,其中ENV或TFS被另一个信号取代。我们(a)评估了两类嵌合语音的平均速率和尖峰时间神经信息水平,一类保留ENV线索,另一类保留TFS;(b)检查了从TFS语音的耳蜗滤波中恢复的ENV水平;(c)使用侧向抑制网络(LIN)检查并量化了尖峰时间线索对恢复的ENV的贡献;(d)构建了线性回归模型,该模型包含平均速率和尖峰时间神经线索的客观测量以及正常听力听众的主观音素感知分数。来自原始ENV和恢复的ENV的平均速率神经线索部分解释了感知分数的变异性,LIN处理的TFS语音恢复的ENV解释了额外的变异性。当同时包含平均速率和尖峰时间神经线索时,发现了对嵌合语音可懂度的最佳模型预测,这进一步证明了当语音包络退化时,TFS线索的尖峰时间编码对可懂度很重要。