幅度和频率压缩对高频语音的声学及感知效应

Acoustic and perceptual effects of amplitude and frequency compression on high-frequency speech.

作者信息

Alexander Joshua M, Rallapalli Varsha

机构信息

Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA.

出版信息

J Acoust Soc Am. 2017 Aug;142(2):908. doi: 10.1121/1.4997938.

DOI:10.1121/1.4997938

PMID:28863610

Abstract

This study investigated how six different amplification methods influence acoustic properties, and subsequently perception, of high-frequency cues in fricatives that have been processed with conventional full bandwidth amplification or nonlinear frequency compression (NFC)-12 conditions total. Amplification methods included linear gain, fast/slow-acting wide dynamic range compression crossed with fixed/individualized compression parameters, and a method with adaptive time constants. Twenty-one hearing-impaired listeners identified seven fricatives in nonsense syllables produced by female talkers. For NFC stimuli, frequency-compressed filters that precisely aligned 1/3-octave bands between input and output were used to quantify effective compression ratio, audibility, and temporal envelope modulation relative to the input. Results indicated significant relationships between these acoustic properties, each of which contributed significantly to fricative recognition across the entire corpus of stimuli. Recognition was significantly better for NFC stimuli compared with full bandwidth stimuli, regardless of the amplification method, which had complementary effects on audibility and envelope modulation. Furthermore, while there were significant differences in recognition across the amplification methods, they were not consistent across phonemes. Therefore, neither recognition nor acoustic data overwhelmingly suggest that one amplification method should be used over another for transmission of high-frequency cues in isolated syllables. Longer duration stimuli and more realistic listening conditions should be examined.

摘要

本研究调查了六种不同的放大方法如何影响经传统全带宽放大或非线性频率压缩（NFC）-12条件下处理的擦音中高频线索的声学特性以及随后的感知。放大方法包括线性增益、具有固定/个性化压缩参数的快/慢作用宽动态范围压缩，以及一种具有自适应时间常数的方法。21名听力受损的听众识别了女性说话者说出的无意义音节中的七个擦音。对于NFC刺激，使用精确对齐输入和输出之间1/3倍频程频段的频率压缩滤波器来量化相对于输入的有效压缩率、可听度和时间包络调制。结果表明这些声学特性之间存在显著关系，并且每种特性对整个刺激语料库中的擦音识别都有显著贡献。无论放大方法如何，NFC刺激的识别效果均显著优于全带宽刺激，全带宽刺激对可听度和包络调制具有互补作用。此外，虽然不同放大方法在识别上存在显著差异，但在不同音素之间并不一致。因此，无论是识别还是声学数据都没有压倒性地表明在孤立音节中传输高频线索时应优先使用一种放大方法而非另一种。应研究更长时长的刺激和更逼真的聆听条件。