关于理解语音所需的声道数量。

On the number of channels needed to understand speech.

作者信息

Loizou P C, Dorman M, Tu Z

机构信息

Department of Electrical Engineering, University of Texas at Dallas, Richardson 75083-0688, USA.

出版信息

J Acoust Soc Am. 1999 Oct;106(4 Pt 1):2097-103. doi: 10.1121/1.427954.

DOI:10.1121/1.427954

PMID:10530032

Abstract

Recent studies have shown that high levels of speech understanding could be achieved when the speech spectrum was divided into four channels and then reconstructed as a sum of four noise bands or sine waves with frequencies equal to the center frequencies of the channels. In these studies speech understanding was assessed using sentences produced by a single male talker. The aim of experiment 1 was to assess the number of channels necessary for a high level of speech understanding when sentences were produced by multiple talkers. In experiment 1, sentences produced by 135 different talkers were processed through n (2 < or = n < or = 16) number of channels, synthesized as a sum of n sine waves with frequencies equal to the center frequencies of the filters, and presented to normal-hearing listeners for identification. A minimum of five channels was needed to achieve a high level (90%) of speech understanding. Asymptotic performance was achieved with eight channels, at least for the speech material used in this study. The outcome of experiment 1 demonstrated that the number of channels needed to reach asymptotic performance varies as a function of the recognition task and/or need for listeners to attend to fine phonetic detail. In experiment 2, sentences were processed through 6 and 16 channels and quantized into a small number of steps. The purpose of this experiment was to investigate whether listeners use across-channel differences in amplitude to code frequency information, particularly when speech is processed through a small number of channels. For sentences processed through six channels there was a significant reduction in speech understanding when the spectral amplitudes were quantized into a small number (< 8) of steps. High levels (92%) of speech understanding were maintained for sentences processed through 16 channels and quantized into only 2 steps. The findings of experiment 2 suggest an inverse relationship between the importance of spectral amplitude resolution (number of steps) and spectral resolution (number of channels).

摘要

最近的研究表明，当语音频谱被分成四个通道，然后被重建为四个噪声带或频率等于通道中心频率的正弦波之和时，可以实现高水平的语音理解。在这些研究中，使用一名男性说话者说出的句子来评估语音理解。实验1的目的是评估当句子由多名说话者说出时，实现高水平语音理解所需的通道数量。在实验1中，由135名不同说话者说出的句子通过n（2≤n≤16）个通道进行处理，合成为n个频率等于滤波器中心频率的正弦波之和，并呈现给听力正常的听众进行识别。至少需要五个通道才能实现高水平（90%）的语音理解。对于本研究中使用的语音材料，八个通道可实现渐近性能。实验1的结果表明，达到渐近性能所需的通道数量随识别任务和/或听众关注精细语音细节的需求而变化。在实验2中，句子通过6个和16个通道进行处理，并量化为少量步骤。本实验的目的是研究听众是否利用跨通道的幅度差异来编码频率信息，特别是当语音通过少量通道进行处理时。对于通过六个通道处理的句子，当频谱幅度被量化为少量（<8）步骤时，语音理解显著下降。对于通过16个通道处理并仅量化为2个步骤的句子，保持了高水平（92%）的语音理解。实验2的结果表明，频谱幅度分辨率（步骤数量）和频谱分辨率（通道数量）的重要性之间存在反比关系。