句子层面的协同发音是否会影响噪声或语音掩蔽环境下的语音识别？

Does Sentence-Level Coarticulation Affect Speech Recognition in Noise or a Speech Masker?

作者信息

Jett Brandi, Buss Emily, Best Virginia, Oleson Jacob, Calandruccio Lauren

机构信息

Department of Psychological Sciences, Case Western Reserve University, Cleveland, OH.

Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill.

出版信息

J Speech Lang Hear Res. 2021 Apr 14;64(4):1390-1403. doi: 10.1044/2021_JSLHR-20-00450. Epub 2021 Mar 30.

DOI:10.1044/2021_JSLHR-20-00450

PMID:33784185

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8608179/

Abstract

Purpose Three experiments were conducted to better understand the role of between-word coarticulation in masked speech recognition. Specifically, we explored whether naturally coarticulated sentences supported better masked speech recognition as compared to sentences derived from individually spoken concatenated words. We hypothesized that sentence recognition thresholds (SRTs) would be similar for coarticulated and concatenated sentences in a noise masker but would be better for coarticulated sentences in a speech masker. Method Sixty young adults participated ( = 20 per experiment). An adaptive tracking procedure was used to estimate SRTs in the presence of noise or two-talker speech maskers. Targets in Experiments 1 and 2 were matrix-style sentences, while targets in Experiment 3 were semantically meaningful sentences. All experiments included coarticulated and concatenated targets; Experiments 2 and 3 included a third target type, concatenated keyword-intensity-matched (KIM) sentences, in which the words were concatenated but individually scaled to replicate the intensity contours of the coarticulated sentences. Results Regression analyses evaluated the main effects of target type, masker type, and their interaction. Across all three experiments, effects of target type were small (< 2 dB). In Experiment 1, SRTs were slightly poorer for coarticulated than concatenated sentences. In Experiment 2, coarticulation facilitated speech recognition compared to the concatenated KIM condition. When listeners had access to semantic context (Experiment 3), a coarticulation benefit was observed in noise but not in the speech masker. Conclusions Overall, differences between SRTs for sentences with and without between-word coarticulation were small. Beneficial effects of coarticulation were only observed relative to the concatenated KIM targets; for unscaled concatenated targets, it appeared that consistent audibility across the sentence offsets any benefit of coarticulation. Contrary to our hypothesis, effects of coarticulation generally were not more pronounced in speech maskers than in noise maskers.

摘要

目的进行了三项实验，以更好地理解词间协同发音在掩蔽语音识别中的作用。具体而言，我们探究了与由单个说出的拼接词组成的句子相比，自然协同发音的句子是否更有助于掩蔽语音识别。我们假设，在噪声掩蔽器中，协同发音和拼接句子的句子识别阈值（SRT）会相似，但在语音掩蔽器中，协同发音的句子的识别阈值会更低。方法 60名年轻成年人参与实验（每个实验20人）。采用自适应跟踪程序来估计在噪声或双说话者语音掩蔽器存在的情况下的SRT。实验1和2中的目标是矩阵式句子，而实验3中的目标是语义有意义的句子。所有实验都包括协同发音和拼接的目标；实验2和3包括第三种目标类型，即拼接的关键词强度匹配（KIM）句子，其中单词是拼接的，但分别进行缩放以复制协同发音句子的强度轮廓。结果回归分析评估了目标类型、掩蔽器类型及其相互作用的主要影响。在所有三项实验中，目标类型的影响都很小（<2 dB）。在实验1中，协同发音的句子的SRT比拼接句子略差。在实验2中，与拼接的KIM条件相比，协同发音促进了语音识别。当听众能够获取语义上下文时（实验3），在噪声中观察到了协同发音的益处，但在语音掩蔽器中未观察到。结论总体而言，有词间协同发音和无词间协同发音的句子的SRT之间的差异很小。协同发音的有益效果仅相对于拼接的KIM目标被观察到；对于未缩放的拼接目标，似乎整个句子中一致的可听度抵消了协同发音的任何益处。与我们的假设相反，协同发音的效果通常在语音掩蔽器中并不比在噪声掩蔽器中更明显。