Department of Speech, Hearing and Phonetic Sciences, 4919UCL, London, UK.
Department of Electrical and Electronic Engineering, 4615Imperial College, London, UK.
Trends Hear. 2022 Jan-Dec;26:23312165211068629. doi: 10.1177/23312165211068629.
A signal processing approach combining beamforming with mask-informed speech enhancement was assessed by measuring sentence recognition in listeners with mild-to-moderate hearing impairment in adverse listening conditions that simulated the output of behind-the-ear hearing aids in a noisy classroom. Two types of beamforming were compared: binaural, with the two microphones of each aid treated as a single array, and bilateral, where independent left and right beamformers were derived. Binaural beamforming produces a narrower beam, maximising improvement in signal-to-noise ratio (SNR), but eliminates the spatial diversity that is preserved in bilateral beamforming. Each beamformer type was optimised for the true target position and implemented with and without additional speech enhancement in which spectral features extracted from the beamformer output were passed to a deep neural network trained to identify time-frequency regions dominated by target speech. Additional conditions comprising binaural beamforming combined with speech enhancement implemented using Wiener filtering or modulation-domain Kalman filtering were tested in normally-hearing (NH) listeners. Both beamformer types gave substantial improvements relative to no processing, with significantly greater benefit for binaural beamforming. Performance with additional mask-informed enhancement was poorer than with beamforming alone, for both beamformer types and both listener groups. In NH listeners the addition of mask-informed enhancement produced significantly poorer performance than both other forms of enhancement, neither of which differed from the beamformer alone. In summary, the additional improvement in SNR provided by binaural beamforming appeared to outweigh loss of spatial information, while speech understanding was not further improved by the mask-informed enhancement method implemented here.
一种结合波束形成和掩蔽语音增强的信号处理方法,通过测量在模拟嘈杂教室中使用耳后助听器输出的不利聆听条件下,轻度至中度听力障碍的聆听者的句子识别能力来评估。比较了两种类型的波束形成:双耳,每个助听器的两个麦克风作为单个阵列处理;双侧,其中独立的左、右波束形成器是衍生的。双耳波束形成产生更窄的波束,最大限度地提高信噪比 (SNR) 的改善,但消除了在双侧波束形成中保留的空间多样性。每个波束形成器类型都针对真实目标位置进行了优化,并在有和没有额外语音增强的情况下进行了实施,其中从波束形成器输出中提取的频谱特征被传递到经过训练以识别由目标语音主导的时频区域的深度神经网络。在正常听力 (NH) 聆听者中测试了包含双边波束形成和使用维纳滤波或调制域卡尔曼滤波实现的语音增强的其他条件。与无处理相比,两种波束形成器类型都有很大的改进,双侧波束形成器的受益更大。与单独使用波束形成器相比,使用额外掩蔽语音增强的性能更差,两种听力组的情况均如此。在 NH 聆听者中,添加掩蔽语音增强会导致比其他两种增强形式都更差的性能,而这两种增强形式都与单独使用波束形成器没有区别。总之,双侧波束形成提供的 SNR 额外提高似乎超过了空间信息的损失,而这里实施的掩蔽语音增强方法并没有进一步提高语音理解能力。