测量和预测语音反射的双耳时间整合。

Measurement and Prediction of Binaural-Temporal Integration of Speech Reflections.

机构信息

1 Department of Speech, Language and Hearing Sciences, Boston University, Boston, MA, USA.

2 Project Group Hearing, Speech and Audio Technology, Fraunhofer Institute for Digital Media Technology IDMT, Cluster of Excellence Hearing4all, Oldenburg, Germany.

出版信息

Trends Hear. 2019 Jan-Dec;23:2331216519854267. doi: 10.1177/2331216519854267.

DOI:10.1177/2331216519854267

PMID:31234732

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6593929/

Abstract

For speech intelligibility in rooms, the temporal integration of speech reflections is typically modeled by separating the room impulse response (RIR) into an early (assumed beneficial for speech intelligibility) and a late part (assumed detrimental). This concept was challenged in this study by employing binaural RIRs with systematically varied interaural phase differences (IPDs) and amplitude of the direct sound and a variable number of reflections delayed by up to 200 ms. Speech recognition thresholds in stationary noise were measured in normal-hearing listeners for 86 conditions. The data showed that direct sound and one or several early speech reflections could be perfectly integrated when they had the same IPD. Early reflections with the same IPD as the noise (but not as the direct sound) could not be perfectly integrated with the direct sound. All conditions in which the dominant speech information was within the early RIR components could be well predicted by a binaural speech intelligibility model using classic early/late separation. In contrast, when amplitude or IPD favored late RIR components, listeners appeared to be capable of focusing on these components rather than on the precedent direct sound. This could not be modeled by an early/late separation window but required a temporal integration window that can be flexibly shifted along the RIR.

摘要

为了提高室内语音可懂度，通常将房间脉冲响应（RIR）分为早期（假定对语音可懂度有益）和晚期（假定对语音可懂度有害）两部分来模拟语音反射的时间整合。本研究通过使用具有系统变化的两耳间相位差（IPD）和直达声幅度以及多达 200ms 延迟的可变数量反射的双耳 RIR 来挑战这一概念。本研究在正常听力的听众中，针对 86 种条件测量了固定噪声中的语音识别阈值。结果表明，当具有相同 IPD 时，直达声和一个或多个早期语音反射可以完全整合。具有与噪声相同 IPD（但与直达声不同）的早期反射无法与直达声完全整合。当主导语音信息位于早期 RIR 分量内的所有条件都可以通过使用经典的早期/晚期分离的双耳语音可懂度模型很好地预测。相比之下，当幅度或 IPD 有利于晚期 RIR 分量时，听众似乎能够专注于这些分量，而不是之前的直达声。这不能通过早期/晚期分离窗口进行建模，而需要一个可以沿 RIR 灵活移动的时间整合窗口。