频谱和时间线索对韩语音素识别的相对贡献

Relative Contributions of Spectral and Temporal Cues to Korean Phoneme Recognition.

作者信息

Kim Bong Jik, Chang Son-A, Yang Jing, Oh Seung-Ha, Xu Li

机构信息

Department of Otorhinolaryngology, Seoul National University College of Medicine, Seoul, Korea.

School of Speech Language Therapy and Aural Rehabilitation, Woosong University, Daejeon, Korea.

出版信息

PLoS One. 2015 Jul 10;10(7):e0131807. doi: 10.1371/journal.pone.0131807. eCollection 2015.

DOI:10.1371/journal.pone.0131807

PMID:26162017

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4498788/

Abstract

This study was aimed to evaluate the relative contributions of spectral and temporal information to Korean phoneme recognition and to compare them with those to English phoneme recognition. Eleven normal-hearing Korean-speaking listeners participated in the study. Korean phonemes, including 18 consonants in a /Ca/ format and 17 vowels in a /hVd/ format, were processed through a noise vocoder. The spectral information was controlled by varying the number of channels (1, 2, 3, 4, 6, 8, 12, and 16) whereas the temporal information was controlled by varying the lowpass cutoff frequency of the envelope extractor (1 to 512 Hz in octave steps). A total of 80 vocoder conditions (8 numbers of channels × 10 lowpass cutoff frequencies) were presented to listeners for phoneme recognition. While vowel recognition depended on the spectral cues predominantly, a tradeoff between the spectral and temporal information was evident for consonant recognition. The overall consonant recognition was dramatically lower than that of English consonant recognition under similar vocoder conditions. The complexity of the Korean consonant repertoire, the three-way distinction of stops in particular, hinders recognition of vocoder-processed phonemes.

摘要

本研究旨在评估频谱和时间信息对韩语音素识别的相对贡献，并将其与对英语音素识别的贡献进行比较。11名听力正常的韩语使用者参与了该研究。韩语音素，包括18个/Ca/格式的辅音和17个/hVd/格式的元音，通过噪声声码器进行处理。通过改变声道数量（1、2、3、4、6、8、12和16）来控制频谱信息，而通过改变包络提取器的低通截止频率（以倍频程步长从1到512Hz）来控制时间信息。总共80种声码器条件（8种声道数量×10种低通截止频率）呈现给听众进行音素识别。虽然元音识别主要依赖于频谱线索，但辅音识别中频谱和时间信息之间的权衡很明显。在类似的声码器条件下，韩语辅音的总体识别率明显低于英语辅音的识别率。韩语辅音库的复杂性，尤其是塞音的三分区别，阻碍了对声码器处理的音素的识别。