Suppr超能文献

声调包络线索在中国语音识别中的重要性。

Importance of tonal envelope cues in Chinese speech recognition.

作者信息

Fu Q J, Zeng F G, Shannon R V, Soli S D

机构信息

Department of Auditory Implants and Perception, House Ear Institute, Los Angeles, California 90057, USA.

出版信息

J Acoust Soc Am. 1998 Jul;104(1):505-10. doi: 10.1121/1.423251.

Abstract

Recent studies have shown that temporal waveform envelope cues can provide significant information for English speech recognition. This study investigated the use of temporal envelope cues in a tonal language: Mandarin Chinese. In this study, the speech was divided into several frequency analysis bands; the amplitude envelope was extracted from each band by half-wave rectification and low-pass filtering and was used to modulate a noise of the same bandwidth as the analysis band. These manipulations preserved temporal and amplitude cues in each frequency band, but removed the spectral detail within each band. Chinese vowels, consonants, tones and sentences were identified by 12 native Chinese-speaking listeners with 1, 2, 3, and 4 noise bands. The results showed that the recognition score of vowels, consonants, and sentences increased monotonically with the number of bands, a pattern similar to that observed in English speech recognition. In contrast, tones were consistently recognized at about 80% correct level, independent of the number of bands. This high level of tone recognition produced a significant difference in the open-set sentence recognition between Chinese (11.0%) and English (2.9%) for the one-band condition where no spectral information was available. The data also revealed that, with primarily temporal cues, the falling-rising tone (tone 3) and the falling tone (tone 4) were more easily recognized than the flat tone (tone 1) and the rising tone (tone 2). This differential pattern in tone recognition resulted in a similar pattern in word recognition: words having either tone 3 or 4 were more likely to be recognized while words having tone 1 and 2 were not. The quantitative role of tones in Chinese speech recognition was further explored using a power-function model and found to play a significant role in relating phoneme recognition to sentence recognition.

摘要

最近的研究表明,时间波形包络线索可为英语语音识别提供重要信息。本研究调查了声调语言(汉语普通话)中时间包络线索的使用情况。在本研究中,语音被划分为几个频率分析频段;通过半波整流和低通滤波从每个频段提取幅度包络,并用于调制与分析频段带宽相同的噪声。这些操作保留了每个频段的时间和幅度线索,但去除了每个频段内的频谱细节。12名以汉语为母语的听众分别在有1、2、3和4个噪声频段的情况下对汉语元音、辅音、声调及句子进行识别。结果表明,元音、辅音和句子的识别分数随频段数量单调增加,这一模式与英语语音识别中观察到的相似。相比之下,声调的正确识别率始终保持在约80%的水平,与频段数量无关。在单频段条件下(即没有频谱信息),这种高水平的声调识别导致汉语(11.0%)和英语(2.9%)在开放集句子识别上存在显著差异。数据还显示,在主要依靠时间线索的情况下,上声(三声)和去声(四声)比阴平(一声)和阳平(二声)更容易识别。这种声调识别中的差异模式在单词识别中也呈现出类似模式:包含三声或四声的单词更有可能被识别,而包含一声和二声的单词则不然。使用幂函数模型进一步探究了声调在汉语语音识别中的定量作用,并发现其在将音素识别与句子识别联系起来方面发挥着重要作用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验