Department of Communicative Disorders and Deaf Education, Utah State University, Logan.
Department of Speech and Hearing Science, The Ohio State University, Columbus.
J Speech Lang Hear Res. 2023 May 9;66(5):1853-1866. doi: 10.1044/2023_JSLHR-22-00558. Epub 2023 Mar 21.
Background noise reduces speech intelligibility. Time-frequency (T-F) masking is an established signal processing technique that improves intelligibility of neurotypical speech in background noise. Here, we investigated a novel application of T-F masking, assessing its potential to improve intelligibility of neurologically degraded speech in background noise.
Listener participants ( = 422) completed an intelligibility task either in the laboratory or online, listening to and transcribing audio recordings of neurotypical (control) and neurologically degraded (dysarthria) speech under three different processing types: speech in quiet (quiet), speech mixed with cafeteria noise (noise), and speech mixed with cafeteria noise and then subsequently processed by an ideal quantized mask (IQM) to remove the noise.
We observed significant reductions in intelligibility of dysarthric speech, even at highly favorable signal-to-noise ratios (+11 to +23 dB) that did not impact neurotypical speech. We also observed significant intelligibility improvements from speech in noise to IQM-processed speech for both control and dysarthric speech across a wide range of noise levels. Furthermore, the overall benefit of IQM processing for dysarthric speech was comparable with that of the control speech in background noise, as was the intelligibility data collected in the laboratory versus online.
This study demonstrates proof of concept, validating the application of T-F masks to a neurologically degraded speech signal. Given that intelligibility challenges greatly impact communication, and thus the lives of people with dysarthria and their communication partners, the development of clinical tools to enhance intelligibility in this clinical population is critical.
背景噪声会降低言语可懂度。时频(T-F)掩蔽是一种已确立的信号处理技术,可提高背景噪声中神经典型语音的可懂度。在这里,我们研究了 T-F 掩蔽的一种新应用,评估其在背景噪声中改善神经受损语音可懂度的潜力。
聆听者参与者(n=422)在实验室或在线完成了一项可懂度任务,他们聆听并转录了神经典型(对照)和神经受损(构音障碍)语音的音频记录,这些语音在三种不同的处理类型下进行:安静环境下的语音(安静)、与自助餐厅噪声混合的语音(噪声),以及与自助餐厅噪声混合后通过理想量化掩蔽(IQM)处理以去除噪声的语音。
我们观察到构音障碍语音的可懂度显著降低,即使在对神经典型语音没有影响的高度有利信噪比(+11 到+23 dB)下也是如此。我们还观察到,对于对照语音和构音障碍语音,无论是在何种噪声水平下,从噪声中的语音到 IQM 处理后的语音,都能显著提高可懂度。此外,IQM 处理对构音障碍语音的总体益处与背景噪声中对照语音的可懂度相当,在实验室和在线收集的可懂度数据也是如此。
本研究证明了概念验证,验证了 T-F 掩蔽在神经受损语音信号中的应用。鉴于可懂度挑战极大地影响了交流,进而影响了构音障碍患者及其交流伙伴的生活,因此开发用于增强该临床人群可懂度的临床工具至关重要。