Department of Multimedia Systems, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, 11/12 Narutowicza Street, 80-233 Gdansk, Poland.
Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, 11/12 Narutowicza Street, 80-233 Gdansk, Poland.
Sensors (Basel). 2022 Feb 19;22(4):1641. doi: 10.3390/s22041641.
Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way of objectively measuring the quality of, e.g., an acoustical adaptation of conference rooms or public address systems. The wide use of this measure and implementation of this method on numerous measurement devices make STI a popular choice when the speech-related quality of rooms has to be estimated. However, the STI measure has a significant drawback which excludes it from some particular use cases. For instance, if one would like to enhance speech intelligibility by employing a nonlinear digital processing algorithm, the STI method is not suitable to measure the impact of such an algorithm, as it requires that the measurement signal should not be altered in a nonlinear way. Consequently, if a nonlinear speech enhancing algorithm has to be tested, the STI-a standard way of estimating speech transmission cannot be used. In this work, we would like to propose a method based on the STI method but modified in such a way that it makes it possible to employ it for the estimation of the performance of the nonlinear speech intelligibility enhancement method. The proposed approach is based upon a broadband comparison of cumulated energy of the transmitted envelope modulation and the received modulation, so we called it broadband STI (bSTI). Its credibility with regard to signals altered by the environment or nonlinear speech changed by a DSP algorithm is checked by performing a comparative analysis of ten selected impulse responses for which a baseline value of STI was known.
客观评估语音可懂度是一项复杂的任务,需要考虑许多因素,例如人类听觉感知对每个语音子带的不同感知,或语音信号每个频带的不同物理特性。目前,用于评估语音传输质量的最先进方法是语音传输指数(STI)。它是一种标准化的客观测量方法,例如会议室或公共广播系统的声学适应性。由于这种测量方法的广泛使用和在众多测量设备上的实现,使得 STI 成为估计房间与语音相关的质量时的流行选择。然而,STI 测量方法有一个显著的缺点,使其不适合某些特定的使用情况。例如,如果有人希望通过采用非线性数字处理算法来提高语音可懂度,那么 STI 方法不适合测量这种算法的影响,因为它要求测量信号不应以非线性方式改变。因此,如果要测试非线性语音增强算法,则不能使用 STI-一种估计语音传输的标准方法。在这项工作中,我们希望提出一种基于 STI 方法但经过修改的方法,使其能够用于估计非线性语音可懂度增强方法的性能。所提出的方法基于传输包络调制和接收调制的累积能量的宽带比较,因此我们称之为宽带 STI(bSTI)。通过对十个选定的脉冲响应进行比较分析来检查其对环境改变的信号或由 DSP 算法改变的非线性语音的可信度,对于这些脉冲响应,我们已知 STI 的基准值。