普通话句子识别中不同频率区域时间包络线索的相对权重

The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for Mandarin Sentence Recognition.

作者信息

Guo Yang, Sun Yuanyuan, Feng Yanmei, Zhang Yujun, Yin Shankai

机构信息

Department of Otolaryngology Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, No. 600, Yishan Road, Xuhui District, Shanghai 200233, China.

出版信息

Neural Plast. 2017;2017:7416727. doi: 10.1155/2017/7416727. Epub 2017 Jan 19.

DOI:10.1155/2017/7416727

PMID:28203463

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5288535/

Abstract

Acoustic temporal envelope (E) cues containing speech information are distributed across the frequency spectrum. To investigate the relative weight of E cues in different frequency regions for Mandarin sentence recognition, E information was extracted from 30 contiguous bands across the range of 80-7,562 Hz using Hilbert decomposition and then allocated to five frequency regions. Recognition scores were obtained with acoustic E cues from 1 or 2 random regions from 40 normal-hearing listeners. While the recognition scores ranged from 8.2% to 16.3% when E information from only one region was available, the scores ranged from 57.9% to 87.7% when E information from two frequency regions was presented, suggesting a synergistic effect among the temporal E cues in different frequency regions. Next, the relative contributions of the E information from the five frequency regions to sentence perception were computed using a least-squares approach. The results demonstrated that, for Mandarin Chinese, a tonal language, the temporal E cues of Frequency Region 1 (80-502 Hz) and Region 3 (1,022-1,913 Hz) contributed more to the intelligence of sentence recognition than other regions, particularly the region of 80-502 Hz, which contained fundamental frequency () information.

摘要

包含语音信息的声学时间包络（E）线索分布在整个频谱中。为了研究不同频率区域中E线索对汉语句子识别的相对权重，使用希尔伯特分解从80 - 7562赫兹范围内的30个连续频段中提取E信息，然后将其分配到五个频率区域。从40名听力正常的听众中，用来自1个或2个随机区域的声学E线索获得识别分数。当仅可获得来自一个区域的E信息时，识别分数在8.2%至16.3%之间，而当呈现来自两个频率区域的E信息时，分数在57.9%至87.7%之间，这表明不同频率区域的时间E线索之间存在协同效应。接下来，使用最小二乘法计算五个频率区域的E信息对句子感知的相对贡献。结果表明，对于汉语这种声调语言，频率区域1（80 - 502赫兹）和区域3（1022 - 1913赫兹）的时间E线索对句子识别的贡献比其他区域更大，特别是包含基频（）信息的80 - 502赫兹区域。