词汇切分的声学线索：对重新合成语音的一项研究。

Acoustic cues to lexical segmentation: a study of resynthesized speech.

作者信息

Spitzer Stephanie M, Liss Julie M, Mattys Sven L

机构信息

Motor Speech Disorders Laboratory, Department of Speech and Hearing Science, Arizona State University, Box 870102, Tempe, Arizona 85281-0102, USA.

出版信息

J Acoust Soc Am. 2007 Dec;122(6):3678-87. doi: 10.1121/1.2801545.

DOI:10.1121/1.2801545

PMID:18247775

Abstract

It has been posited that the role of prosody in lexical segmentation is elevated when the speech signal is degraded or unreliable. Using predictions from Cutler and Norris' [J. Exp. Psychol. Hum. Percept. Perform. 14, 113-121 (1988)] metrical segmentation strategy hypothesis as a framework, this investigation examined how individual suprasegmental and segmental cues to syllabic stress contribute differentially to the recognition of strong and weak syllables for the purpose of lexical segmentation. Syllabic contrastivity was reduced in resynthesized phrases by systematically (i) flattening the fundamental frequency (F0) contours, (ii) equalizing vowel durations, (iii) weakening strong vowels, (iv) combining the two suprasegmental cues, i.e., F0 and duration, and (v) combining the manipulation of all cues. Results indicated that, despite similar decrements in overall intelligibility, F0 flattening and the weakening of strong vowels had a greater impact on lexical segmentation than did equalizing vowel duration. Both combined-cue conditions resulted in greater decrements in intelligibility, but with no additional negative impact on lexical segmentation. The results support the notion of F0 variation and vowel quality as primary conduits for stress-based segmentation and suggest that the effectiveness of stress-based segmentation with degraded speech must be investigated relative to the suprasegmental and segmental impoverishments occasioned by each particular degradation.

摘要

有人提出，当语音信号质量下降或不可靠时，韵律在词汇切分中的作用会增强。本研究以卡特勒和诺里斯[《实验心理学杂志：人类感知与表现》14卷，第113 - 121页（1988年）]的韵律切分策略假设为框架，探讨了音节重音的超音段和音段线索如何以不同方式促进强弱音节的识别，以实现词汇切分。通过系统地（i）使基频（F0）轮廓变平，（ii）均衡元音时长，（iii）弱化强元音，（iv）组合F0和时长这两种超音段线索，以及（v）组合所有线索的操作，降低了重新合成短语中的音节对比度。结果表明，尽管总体可懂度有相似程度的下降，但F0变平和强元音弱化对词汇切分的影响大于元音时长的均衡。两种组合线索条件导致可懂度下降幅度更大，但对词汇切分没有额外的负面影响。这些结果支持了F0变化和元音质量是基于重音切分的主要途径这一观点，并表明对于语音质量下降时基于重音切分的有效性，必须相对于每种特定质量下降所导致的超音段和音段信息缺失来进行研究。