Bernstein Joshua G W, Summers Van, Grassi Elena, Grant Ken W
Audiology and Speech Center, Scientific and Clinical Studies Section, Walter Reed National Military Medical Center, Bethesda, MD 20889, USA.
J Am Acad Audiol. 2013 Apr;24(4):307-28. doi: 10.3766/jaaa.24.4.6.
Hearing-impaired (HI) individuals with similar ages and audiograms often demonstrate substantial differences in speech-reception performance in noise. Traditional models of speech intelligibility focus primarily on average performance for a given audiogram, failing to account for differences between listeners with similar audiograms. Improved prediction accuracy might be achieved by simulating differences in the distortion that speech may undergo when processed through an impaired ear. Although some attempts to model particular suprathreshold distortions can explain general speech-reception deficits not accounted for by audibility limitations, little has been done to model suprathreshold distortion and predict speech-reception performance for individual HI listeners. Auditory-processing models incorporating individualized measures of auditory distortion, along with audiometric thresholds, could provide a more complete understanding of speech-reception deficits by HI individuals. A computational model capable of predicting individual differences in speech-recognition performance would be a valuable tool in the development and evaluation of hearing-aid signal-processing algorithms for enhancing speech intelligibility.
This study investigated whether biologically inspired models simulating peripheral auditory processing for individual HI listeners produce more accurate predictions of speech-recognition performance than audiogram-based models.
Psychophysical data on spectral and temporal acuity were incorporated into individualized auditory-processing models consisting of three stages: a peripheral stage, customized to reflect individual audiograms and spectral and temporal acuity; a cortical stage, which extracts spectral and temporal modulations relevant to speech; and an evaluation stage, which predicts speech-recognition performance by comparing the modulation content of clean and noisy speech. To investigate the impact of different aspects of peripheral processing on speech predictions, individualized details (absolute thresholds, frequency selectivity, spectrotemporal modulation [STM] sensitivity, compression) were incorporated progressively, culminating in a model simulating level-dependent spectral resolution and dynamic-range compression.
Psychophysical and speech-reception data from 11 HI and six normal-hearing listeners were used to develop the models.
Eleven individualized HI models were constructed and validated against psychophysical measures of threshold, frequency resolution, compression, and STM sensitivity. Speech-intelligibility predictions were compared with measured performance in stationary speech-shaped noise at signal-to-noise ratios (SNRs) of -6, -3, 0, and 3 dB. Prediction accuracy for the individualized HI models was compared to the traditional audibility-based Speech Intelligibility Index (SII).
Models incorporating individualized measures of STM sensitivity yielded significantly more accurate within-SNR predictions than the SII. Additional individualized characteristics (frequency selectivity, compression) improved the predictions only marginally. A nonlinear model including individualized level-dependent cochlear-filter bandwidths, dynamic-range compression, and STM sensitivity predicted performance more accurately than the SII but was no more accurate than a simpler linear model. Predictions of speech-recognition performance simultaneously across SNRs and individuals were also significantly better for some of the auditory-processing models than for the SII.
A computational model simulating individualized suprathreshold auditory-processing abilities produced more accurate speech-intelligibility predictions than the audibility-based SII. Most of this advantage was realized by a linear model incorporating audiometric and STM-sensitivity information. Although more consistent with known physiological aspects of auditory processing, modeling level-dependent changes in frequency selectivity and gain did not result in more accurate predictions of speech-reception performance.
年龄和听力图相似的听力受损(HI)个体在噪声环境中的言语接收表现往往存在显著差异。传统的言语可懂度模型主要关注给定听力图的平均表现,未能考虑具有相似听力图的听众之间的差异。通过模拟言语在受损耳朵中处理时可能经历的失真差异,或许可以提高预测准确性。尽管一些对特定超阈值失真进行建模的尝试能够解释可听度限制无法解释的一般言语接收缺陷,但在对超阈值失真进行建模并预测个体HI听众的言语接收表现方面,所做的工作很少。纳入听觉失真个体化测量以及听力阈值的听觉处理模型,能够更全面地理解HI个体的言语接收缺陷。一个能够预测言语识别表现个体差异的计算模型,将成为开发和评估用于提高言语可懂度的助听器信号处理算法的宝贵工具。
本研究调查了模拟个体HI听众外周听觉处理的生物启发模型,是否比基于听力图的模型能更准确地预测言语识别表现。
将关于频谱和时间敏锐度的心理物理学数据纳入由三个阶段组成的个体化听觉处理模型:一个外周阶段,根据个体听力图以及频谱和时间敏锐度进行定制;一个皮质阶段,提取与言语相关的频谱和时间调制;一个评估阶段,通过比较纯净语音和噪声语音的调制内容来预测言语识别表现。为了研究外周处理不同方面对言语预测的影响,逐步纳入个体化细节(绝对阈值、频率选择性、频谱时间调制[STM]敏感性、压缩),最终形成一个模拟水平依赖频谱分辨率和动态范围压缩的模型。
来自11名HI听众和6名正常听力听众的心理物理学和言语接收数据用于开发模型。
构建了11个个体化HI模型,并根据阈值、频率分辨率、压缩和STM敏感性的心理物理学测量进行验证。将言语可懂度预测与在-6、-3、0和3 dB信噪比的稳态言语谱噪声中的测量表现进行比较。将个体化HI模型的预测准确性与传统的基于可听度的言语可懂度指数(SII)进行比较。
纳入STM敏感性个体化测量的模型在信噪比内的预测准确性显著高于SII。额外的个体化特征(频率选择性、压缩)仅略微改善了预测。一个包括个体化水平依赖耳蜗滤波器带宽、动态范围压缩和STM敏感性的非线性模型比SII更准确地预测了表现,但并不比一个更简单的线性模型更准确。对于某些听觉处理模型,跨信噪比和个体同时进行的言语识别表现预测也显著优于SII。
一个模拟个体化超阈值听觉处理能力的计算模型,比基于可听度的SII能更准确地预测言语可懂度。这种优势大部分通过一个纳入听力测量和STM敏感性信息的线性模型实现。尽管与听觉处理已知的生理方面更一致,但对频率选择性和增益的水平依赖变化进行建模并未导致对言语接收表现更准确的预测。