Moerman Mieke, Pieters Glenn, Martens Jean-Pierre, Van der Borgt Marie-Jeanne, Dejonckere Phillippe
Institute of Phoniatrics, University Medical Centre Utrecht, Utrecht, The Netherlands.
Eur Arch Otorhinolaryngol. 2004 Nov;261(10):541-7. doi: 10.1007/s00405-003-0681-0. Epub 2004 Jan 15.
This paper describes our first attempts to develop a method for the objective assessment of quality in substitution voices. The objective analysis deals with acoustic parameters characterising short voice and speech samples like a sequence of isolated vowels, a sequence of VCV and CVCVCV syllables, a short sentence, etc. A database of 113 registrations from 68 patients (53 total laryngectomy patients with tracheo-esophageal speech, 14 total laryngectomy patients with esophageal speech and 5 patients with partial frontolateral laryngectomy) and 6 registrations from healthy control persons was collected. Each registration consisted of seven speech utterances and was subjected to an acoustic analysis as well as to a perceptual evaluation, the latter involving eight parameters like "overall impression", "tonicity", etc. Since the goal of our work is to find out the best acoustical measurement for supporting perception and making it precise, it seemed logical to strive for a perceptually based acoustic analysis. We therefore performed the analysis by means of a peripheral auditory model with a built-in fundamental frequency (pitch) extractor. From the frame-level outputs (a frame is 10 ms) of the analyser, global objective parameters, such as (1) the percentage of voiced frames, (2) the average voicing evidence, (3) the voicing length distribution and (4) the fundamental frequency jitter, were computed for the different speech utterances. So as to reduce the parameter variability arising from the nature of the speech utterances (e.g., the presence of pauses in the signal, errors caused by the pitch extractor, etc.), the objective parameters were computed using non-standard averaging schemes involving energy weighting and frame selection. A statistical analysis of the objective parameters confirms that the quality of tracheo-esophageal speech is superior to that of esophageal speech, but inferior to that of normal speech and speech with the preservation of one vocal fold. Correlations between the objective parameters and the perceptual parameters are moderate.
本文描述了我们首次尝试开发一种客观评估替代语音质量的方法。客观分析涉及表征短语音和言语样本的声学参数,如一系列孤立元音、一系列VCV和CVCVCV音节、一个短句等。收集了来自68名患者(53名全喉切除术后采用气管食管发音的患者、14名全喉切除术后采用食管发音的患者以及5名部分额侧喉切除患者)的113份记录和来自健康对照者的6份记录。每份记录包含七个言语发声,并进行了声学分析以及感知评估,后者涉及“总体印象”“音调”等八个参数。由于我们工作的目标是找出支持感知并使其精确的最佳声学测量方法,因此基于感知进行声学分析似乎是合乎逻辑的。因此,我们借助一个内置基频(音高)提取器的外周听觉模型进行了分析。从分析仪的帧级输出(一帧为10毫秒)中,针对不同的言语发声计算了全局客观参数,如(1)浊音帧的百分比、(2)平均浊音证据、(3)浊音时长分布和(4)基频抖动。为了减少因言语发声的性质(例如信号中存在停顿、音高提取器导致的误差等)而产生的参数变异性,使用了涉及能量加权和帧选择的非标准平均方案来计算客观参数。对客观参数的统计分析证实,气管食管发音的质量优于食管发音,但低于正常发音和保留一侧声带的发音。客观参数与感知参数之间的相关性为中等。