Noufi Camille, Berger Jonathan, Frank Michael, Parker Karen, Bowling Daniel L
Stanford University, Center for Computer Research in Music and Acoustics, Stanford, CA, USA.
Stanford School of Medicine, Department of Psychiatry and Behavioral Sciences, Stanford, CA, USA.
Proc IEEE Int Conf Acoust Speech Signal Process. 2023 Jun;2023. doi: 10.1109/icassp49357.2023.10095942. Epub 2023 May 5.
In this paper, we propose a method for removing linguistic information from speech for the purpose of isolating paralinguistic indicators of affect. The immediate utility of this method lies in clinical tests of sensitivity to vocal affect that are not confounded by language, which is impaired in a variety of clinical populations. The method is based on simultaneous recordings of speech audio and electroglotto-graphic (EGG) signals. The speech audio signal is used to estimate the average vocal tract filter response and amplitude envelop. The EGG signal supplies a direct correlate of voice source activity that is mostly independent of phonetic articulation. These signals are used to create a third signal designed to capture as much paralinguistic information from the vocal production system as possible-maximizing the retention of bioacoustic cues to affect-while eliminating phonetic cues to verbal meaning. To evaluate the success of this method, we studied the perception of corresponding speech audio and transformed EGG signals in an affect rating experiment with online listeners. The results show a high degree of similarity in the perceived affect of matched signals, indicating that our method is effective.
在本文中,我们提出了一种从语音中去除语言信息的方法,目的是分离情感的副语言指标。该方法的直接效用在于对声音情感敏感性的临床测试,这些测试不会受到语言的干扰,而语言在各种临床人群中都存在受损情况。该方法基于语音音频和电声门图(EGG)信号的同步记录。语音音频信号用于估计平均声道滤波器响应和幅度包络。EGG信号提供了与声源活动直接相关的信息,该信息大多独立于语音发音。这些信号用于创建第三个信号,旨在从发声系统中尽可能多地捕获副语言信息——最大限度地保留影响情感的生物声学线索,同时消除语音意义的语音线索。为了评估该方法的成功性,我们在一项有在线听众参与的情感评级实验中研究了对相应语音音频和变换后的EGG信号的感知。结果表明,匹配信号在感知情感方面具有高度相似性,表明我们的方法是有效的。