一种理想的压缩式面罩，用于在不牺牲环境声音识别能力的情况下提高语音清晰度a)。

An ideal compressed mask for increasing speech intelligibility without sacrificing environmental sound recognitiona).

作者信息

Johnson Eric M, Healy Eric W

机构信息

Department of Speech and Hearing Science, and Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210, USA.

出版信息

J Acoust Soc Am. 2024 Dec 1;156(6):3958-3969. doi: 10.1121/10.0034599.

DOI:10.1121/10.0034599

PMID:39666959

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11646135/

Abstract

Hearing impairment is often characterized by poor speech-in-noise recognition. State-of-the-art laboratory-based noise-reduction technology can eliminate background sounds from a corrupted speech signal and improve intelligibility, but it can also hinder environmental sound recognition (ESR), which is essential for personal independence and safety. This paper presents a time-frequency mask, the ideal compressed mask (ICM), that aims to provide listeners with improved speech intelligibility without substantially reducing ESR. This is accomplished by limiting the maximum attenuation that the mask performs. Speech intelligibility and ESR for hearing-impaired and normal-hearing listeners were measured using stimuli that had been processed by ICMs with various levels of maximum attenuation. This processing resulted in significantly improved intelligibility while retaining high ESR performance for both types of listeners. It was also found that the same level of maximum attenuation provided the optimal balance of intelligibility and ESR for both listener types. It is argued that future deep-learning-based noise reduction algorithms may provide better outcomes by balancing the levels of the target speech and the background environmental sounds, rather than eliminating all signals except for the target speech. The ICM provides one such simple solution for frequency-domain models.

摘要

听力障碍通常表现为语音噪声识别能力差。基于实验室的先进降噪技术可以从受损语音信号中消除背景声音并提高可懂度，但它也可能阻碍环境声音识别（ESR），而环境声音识别对于个人独立和安全至关重要。本文提出了一种时频掩膜，即理想压缩掩膜（ICM），其目的是在不大幅降低ESR的情况下提高听众的语音可懂度。这是通过限制掩膜执行的最大衰减来实现的。使用经过具有不同最大衰减水平的ICM处理的刺激来测量听力受损和听力正常听众的语音可懂度和ESR。这种处理显著提高了可懂度，同时两种类型的听众都保持了较高的ESR性能。还发现相同水平的最大衰减为两种听众类型提供了可懂度和ESR的最佳平衡。有人认为，未来基于深度学习的降噪算法可能通过平衡目标语音和背景环境声音的水平，而不是消除除目标语音之外的所有信号，来提供更好的结果。ICM为频域模型提供了这样一种简单的解决方案。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种理想的压缩式面罩，用于在不牺牲环境声音识别能力的情况下提高语音清晰度a)。

An ideal compressed mask for increasing speech intelligibility without sacrificing environmental sound recognitiona).

作者信息

机构信息

出版信息

相似文献

本文引用的文献

一种理想的压缩式面罩，用于在不牺牲环境声音识别能力的情况下提高语音清晰度a)。

An ideal compressed mask for increasing speech intelligibility without sacrificing environmental sound recognitiona).

作者信息

机构信息

出版信息

相似文献

本文引用的文献