Audio Information Processing, Technical University of Munich, Munich, Germany.
Trends Hear. 2023 Jan-Dec;27:23312165231188619. doi: 10.1177/23312165231188619.
Speech intelligibility in cocktail party situations has been traditionally studied for stationary sound sources and stationary participants. Here, speech intelligibility and behavior were investigated during active self-rotation of standing participants in a spatialized speech test. We investigated if people would rotate to improve speech intelligibility, and we asked if knowing the target location would be further beneficial. Target sentences randomly appeared at one of four possible locations: 0°, ± 90°, 180° relative to the participant's initial orientation on each trial, while speech-shaped noise was presented from the front (0°). Participants responded naturally with self-rotating motion. Target sentences were presented either without (Audio-only) or with a picture of an avatar (Audio-Visual). In a baseline (Static) condition, people were standing still without visual location cues. Participants' self-orientation undershot the target location and orientations were close to acoustically optimal. Participants oriented more often in an acoustically optimal way, and speech intelligibility was higher in the Audio-Visual than in the Audio-only condition for the lateral targets. The intelligibility of the individual words in Audio-Visual and Audio-only increased during self-rotation towards the rear target, but it was reduced for the lateral targets when compared to Static, which could be mostly, but not fully, attributed to changes in spatial unmasking. Speech intelligibility prediction based on a model of static spatial unmasking considering self-rotations overestimated the participant performance by 1.4 dB. The results suggest that speech intelligibility is reduced during self-rotation, and that visual cues of location help to achieve more optimal self-rotations and better speech intelligibility.
在传统的鸡尾酒会场景下,语音可懂度通常针对静止声源和静止参与者进行研究。在这里,我们在参与者在空间化语音测试中主动自转的情况下,研究了语音可懂度和行为。我们调查了人们是否会为了提高语音可懂度而进行自转,以及如果知道目标位置是否会有进一步的帮助。在每次试验中,目标句子会随机出现在四个可能位置之一:相对于参与者初始方向的 0°、±90°、180°,而语音噪声则从前(0°)方发出。参与者自然地通过自转运动做出响应。目标句子以两种方式呈现:一种是只有音频(Audio-only),另一种是带有头像图片的音频-视觉(Audio-Visual)。在基线(Static)条件下,人们站着不动,没有视觉位置提示。参与者的自我方向与目标位置相差不远,并且接近听觉最佳位置。参与者更经常以听觉最佳的方式进行自我定向,并且在听觉-视觉条件下,侧向目标的语音可懂度高于仅音频条件。在向后方目标进行自我旋转时,音频-视觉和仅音频条件下的个别单词的可懂度增加,但与静态条件相比,侧向目标的可懂度降低,这主要归因于空间掩蔽的变化,但并非完全归因于此。基于考虑自我旋转的静态空间掩蔽模型的语音可懂度预测,将参与者的表现高估了 1.4dB。结果表明,在自我旋转过程中语音可懂度会降低,而位置的视觉提示有助于实现更理想的自我旋转和更好的语音可懂度。