Yiu Edwin M L, Murdoch Bruce, Hird Kathryn, Lau Polly
Department of Speech and Hearing Sciences, The University of Hong Kong, 5/F Prince Phillip Dental Hospital, Sai Ying Pun.
J Acoust Soc Am. 2002 Sep;112(3 Pt 1):1091-101. doi: 10.1121/1.1500753.
Perceptual voice analysis is a subjective process. However, despite reports of varying degrees of intrajudge and interjudge reliability, it is widely used in clinical voice evaluation. One of the ways to improve the reliability of this procedure is to provide judges with signals as external standards so that comparison can be made in relation to these "anchor" signals. The present study used a Klatt speech synthesizer to create a set of speech signals with varying degree of three different voice qualities based on a Cantonese sentence. The primary objective of the study was to determine whether different abnormal voice qualities could be synthesized using the "built-in" synthesis parameters using a perceptual study. The second objective was to determine the relationship between acoustic characteristics of the synthesized signals and perceptual judgment. Twenty Cantonese-speaking speech pathologists with at least three years of clinical experience in perceptual voice evaluation were asked to undertake two tasks. The first was to decide whether the voice quality of the synthesized signals was normal or not. The second was to decide whether the abnormal signals should be described as rough, breathy, or vocal fry. The results showed that signals generated with a small degree of aspiration noise were perceived as breathiness while signals with a small degree of flutter or double pulsing were perceived as roughness. When the flutter or double pulsing increased further, tremor and vocal fry, rather than roughness, were perceived. Furthermore, the amount of aspiration noise, flutter, or double pulsing required for male voice stimuli was different from that required for the female voice stimuli with a similar level of perceptual breathiness and roughness. These findings showed that changes in perceived vocal quality could be achieved by systematic modifications of synthesis parameters. This opens up the possibility of using synthesized voice signals as external standards or "anchors" to improve the reliability of clinical perceptual voice evaluation.
感知语音分析是一个主观过程。然而,尽管有关于不同程度的评判内和评判间可靠性的报道,但它仍广泛应用于临床语音评估。提高该程序可靠性的方法之一是为评判者提供信号作为外部标准,以便能相对于这些“锚定”信号进行比较。本研究使用克拉特语音合成器,基于一个粤语句子创建了一组具有三种不同语音质量且程度各异的语音信号。该研究的主要目的是通过一项感知研究来确定是否可以使用“内置”合成参数合成不同的异常语音质量。第二个目的是确定合成信号的声学特征与感知判断之间的关系。二十名具有至少三年感知语音评估临床经验的粤语语音病理学家被要求执行两项任务。第一项任务是判断合成信号的语音质量是否正常。第二项任务是判断异常信号应被描述为粗糙、呼吸声重还是碎声。结果表明,带有少量吸气噪声产生的信号被感知为呼吸声重,而带有少量颤动或双脉冲的信号被感知为粗糙。当颤动或双脉冲进一步增加时,被感知到的是震颤和碎声,而非粗糙。此外,对于具有相似感知呼吸声重和粗糙程度的男性语音刺激和女性语音刺激,所需的吸气噪声、颤动或双脉冲量是不同的。这些发现表明,通过系统修改合成参数可以实现感知语音质量的变化。这为使用合成语音信号作为外部标准或“锚定”来提高临床感知语音评估的可靠性开辟了可能性。