Division of Cognitive Science, Department of Philosophy, Lund University, Box 192, SE-221 00, Lund, Sweden.
Behav Res Methods. 2019 Apr;51(2):778-792. doi: 10.3758/s13428-018-1095-7.
Voice synthesis is a useful method for investigating the communicative role of different acoustic features. Although many text-to-speech systems are available, researchers of human nonverbal vocalizations and bioacousticians may profit from a dedicated simple tool for synthesizing and manipulating natural-sounding vocalizations. Soundgen ( https://CRAN.R-project.org/package=soundgen ) is an open-source R package that synthesizes nonverbal vocalizations based on meaningful acoustic parameters, which can be specified from the command line or in an interactive app. This tool was validated by comparing the perceived emotion, valence, arousal, and authenticity of 60 recorded human nonverbal vocalizations (screams, moans, laughs, and so on) and their approximate synthetic reproductions. Each synthetic sound was created by manually specifying only a small number of high-level control parameters, such as syllable length and a few anchors for the intonation contour. Nevertheless, the valence and arousal ratings of synthetic sounds were similar to those of the original recordings, and the authenticity ratings were comparable, maintaining parity with the originals for less complex vocalizations. Manipulating the precise acoustic characteristics of synthetic sounds may shed light on the salient predictors of emotion in the human voice. More generally, soundgen may prove useful for any studies that require precise control over the acoustic features of nonspeech sounds, including research on animal vocalizations and auditory perception.
语音合成是研究不同声学特征的交际作用的一种有用方法。虽然有许多文本转语音系统,但研究人类非言语发声和生物声学的研究人员可能会受益于一种专门用于合成和处理自然发声的简单工具。Soundgen(https://CRAN.R-project.org/package=soundgen)是一个开源的 R 包,它基于有意义的声学参数合成非言语发声,这些参数可以从命令行或交互式应用程序中指定。该工具通过比较 60 个记录的人类非言语发声(尖叫、呻吟、笑声等)及其近似的合成再现的感知情绪、效价、唤醒和真实性进行了验证。每个合成声音都是通过手动指定少量高级控制参数(如音节长度和语调轮廓的几个锚点)来创建的。然而,合成声音的效价和唤醒评分与原始录音相似,并且真实性评分相当,对于不太复杂的发声,与原始录音保持一致。操纵合成声音的精确声学特征可能有助于揭示人类声音中情绪的显著预测因素。更一般地说,soundgen 可能对任何需要对非言语声音的声学特征进行精确控制的研究都很有用,包括动物发声和听觉感知的研究。