Monir Nasser-Eddine, Magron Paul, Serizel Romain
Université de Lorraine, CNRS, Inria, Loria, Nancy, France.
Trends Hear. 2024 Jan-Dec;28:23312165241292205. doi: 10.1177/23312165241292205.
In the intricate acoustic landscapes where speech intelligibility is challenged by noise and reverberation, multichannel speech enhancement emerges as a promising solution for individuals with hearing loss. Such algorithms are commonly evaluated at the utterance scale. However, this approach overlooks the granular acoustic nuances revealed by phoneme-specific analysis, potentially obscuring key insights into their performance. This paper presents an in-depth phoneme-scale evaluation of three state-of-the-art multichannel speech enhancement algorithms. These algorithms-filter-and-sum network, minimum variance distortionless response, and Tango-are here extensively evaluated across different noise conditions and spatial setups, employing realistic acoustic simulations with measured room impulse responses, and leveraging diversity offered by multiple microphones in a binaural hearing setup. The study emphasizes the fine-grained phoneme-scale analysis, revealing that while some phonemes like plosives are heavily impacted by environmental acoustics and challenging to deal with by the algorithms, others like nasals and sibilants see substantial improvements after enhancement. These investigations demonstrate important improvements in phoneme clarity in noisy conditions, with insights that could drive the development of more personalized and phoneme-aware hearing aid technologies. Additionally, while this study provides extensive data on the physical metrics of processed speech, these physical metrics do not necessarily imitate human perceptions of speech, and the impact of the findings presented would have to be investigated through listening tests.
在语音清晰度受到噪声和混响挑战的复杂声学环境中,多通道语音增强技术成为一种有前景的解决方案,适用于听力损失人群。此类算法通常在话语尺度上进行评估。然而,这种方法忽略了音素特定分析所揭示的细微声学差异,可能会掩盖对其性能的关键见解。本文对三种先进的多通道语音增强算法进行了深入的音素尺度评估。这些算法——滤波求和网络、最小方差无失真响应和探戈算法——在此针对不同噪声条件和空间设置进行了广泛评估,采用了带有实测房间脉冲响应的逼真声学模拟,并利用双耳听力设置中多个麦克风提供的多样性。该研究强调了细粒度的音素尺度分析,结果表明,虽然像爆破音这样的一些音素受到环境声学的严重影响,算法处理起来具有挑战性,但像鼻音和咝音这样的其他音素在增强后有显著改善。这些研究表明,在嘈杂环境中,音素清晰度有了重要提升,所得见解可能会推动更个性化、音素感知型助听器技术的发展。此外,虽然本研究提供了关于处理后语音物理指标的大量数据,但这些物理指标不一定能模拟人类对语音的感知,所呈现结果的影响还必须通过听力测试来研究。