Gerratt B R, Kreiman J
Division of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, California 90095-1794, USA.
J Acoust Soc Am. 2001 Nov;110(5 Pt 1):2560-6. doi: 10.1121/1.1409969.
Much previous research has demonstrated that listeners do not agree well when using traditional rating scales to measure pathological voice quality. Although these findings may indicate that listeners are inherently unable to agree in their perception of such complex auditory stimuli, another explanation implicates the particular measurement method-rating scale judgments-as the culprit. An alternative method of assessing quality-listener-mediated analysis-synthesis-was devised to assess this possibility. In this new approach, listeners explicitly compare synthetic and natural voice samples, and adjust speech synthesizer parameters to create auditory matches to voice stimuli. This method is designed to replace unstable internal standards for qualities like breathiness and roughness with externally presented stimuli, to overcome major hypothetical sources of disagreement in rating scale judgments. In a preliminary test of the reliability of this method, listeners were asked to adjust the signal-to-noise ratio for 12 synthetic pathological voices so that the resulting stimuli matched the natural target voices as well as possible For comparison to the synthesis judgments, listeners also judged the noisiness of the natural stimuli in a separate task using a traditional visual-analog rating scale. For 9 of the 12 voices, agreement among listeners was significantly (and substantially) greater for the synthesis task than for the rating scale task. Response variances for the two tasks did not differ for the remaining three voices. However, a second experiment showed that the synthesis settings that listeners selected for these three voices were within a difference limen, and therefore observed differences were perceptually insignificant. These results indicate that listeners can in fact agree in their perceptual assessments of voice quality, and that analysis-synthesis can measure perception reliably.
此前的许多研究表明,在使用传统评分量表来衡量病理性嗓音质量时,听众之间的意见不太一致。尽管这些发现可能表明听众在感知此类复杂听觉刺激方面天生就无法达成一致,但另一种解释将问题归咎于特定的测量方法——评分量表判断。为了评估这种可能性,设计了一种评估嗓音质量的替代方法——听众介导的分析合成法。在这种新方法中,听众明确比较合成语音样本和自然语音样本,并调整语音合成器参数以创建与语音刺激的听觉匹配。这种方法旨在用外部呈现的刺激取代诸如呼吸声和粗糙声等质量的不稳定内部标准,以克服评分量表判断中主要的假设性分歧来源。在对该方法可靠性的初步测试中,要求听众调整12个合成病理性嗓音的信噪比,以使生成的刺激尽可能与自然目标嗓音匹配。为了与合成判断进行比较,听众还在一项单独任务中使用传统的视觉模拟评分量表对自然刺激的嘈杂程度进行了判断。对于12个嗓音中的9个,听众在合成任务中的一致性明显(且显著)高于评分量表任务。对于其余三个嗓音,两项任务的反应方差没有差异。然而,第二项实验表明,听众为这三个嗓音选择的合成设置在辨别阈限之内,因此观察到的差异在感知上并不显著。这些结果表明,听众实际上能够在对嗓音质量的感知评估上达成一致,并且分析合成法能够可靠地测量感知。