Soundgen：一个用于合成非言语发声的开源工具。

Soundgen: An open-source tool for synthesizing nonverbal vocalizations.

机构信息

Division of Cognitive Science, Department of Philosophy, Lund University, Box 192, SE-221 00, Lund, Sweden.

出版信息

Behav Res Methods. 2019 Apr;51(2):778-792. doi: 10.3758/s13428-018-1095-7.

DOI:10.3758/s13428-018-1095-7

PMID:30054898

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6478631/

Abstract

Voice synthesis is a useful method for investigating the communicative role of different acoustic features. Although many text-to-speech systems are available, researchers of human nonverbal vocalizations and bioacousticians may profit from a dedicated simple tool for synthesizing and manipulating natural-sounding vocalizations. Soundgen ( https://CRAN.R-project.org/package=soundgen ) is an open-source R package that synthesizes nonverbal vocalizations based on meaningful acoustic parameters, which can be specified from the command line or in an interactive app. This tool was validated by comparing the perceived emotion, valence, arousal, and authenticity of 60 recorded human nonverbal vocalizations (screams, moans, laughs, and so on) and their approximate synthetic reproductions. Each synthetic sound was created by manually specifying only a small number of high-level control parameters, such as syllable length and a few anchors for the intonation contour. Nevertheless, the valence and arousal ratings of synthetic sounds were similar to those of the original recordings, and the authenticity ratings were comparable, maintaining parity with the originals for less complex vocalizations. Manipulating the precise acoustic characteristics of synthetic sounds may shed light on the salient predictors of emotion in the human voice. More generally, soundgen may prove useful for any studies that require precise control over the acoustic features of nonspeech sounds, including research on animal vocalizations and auditory perception.

摘要

语音合成是研究不同声学特征的交际作用的一种有用方法。虽然有许多文本转语音系统，但研究人类非言语发声和生物声学的研究人员可能会受益于一种专门用于合成和处理自然发声的简单工具。Soundgen（https://CRAN.R-project.org/package=soundgen）是一个开源的 R 包，它基于有意义的声学参数合成非言语发声，这些参数可以从命令行或交互式应用程序中指定。该工具通过比较 60 个记录的人类非言语发声（尖叫、呻吟、笑声等）及其近似的合成再现的感知情绪、效价、唤醒和真实性进行了验证。每个合成声音都是通过手动指定少量高级控制参数（如音节长度和语调轮廓的几个锚点）来创建的。然而，合成声音的效价和唤醒评分与原始录音相似，并且真实性评分相当，对于不太复杂的发声，与原始录音保持一致。操纵合成声音的精确声学特征可能有助于揭示人类声音中情绪的显著预测因素。更一般地说，soundgen 可能对任何需要对非言语声音的声学特征进行精确控制的研究都很有用，包括动物发声和听觉感知的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7366/6478631/38dd2fd3c186/13428_2018_1095_Fig1_HTML.jpg

相似文献

Soundgen: An open-source tool for synthesizing nonverbal vocalizations.Soundgen：一个用于合成非言语发声的开源工具。

Behav Res Methods. 2019 Apr;51(2):778-792. doi: 10.3758/s13428-018-1095-7.

A Moan of Pleasure Should Be Breathy: The Effect of Voice Quality on the Meaning of Human Nonverbal Vocalizations.呻吟应该是有气声的：音质对人类非言语发声含义的影响。

Phonetica. 2020;77(5):327-349. doi: 10.1159/000504855. Epub 2020 Jan 21.

Superior Communication of Positive Emotions Through Nonverbal Vocalisations Compared to Speech Prosody.与言语韵律相比，通过非语言发声传递积极情绪的效果更佳。

J Nonverbal Behav. 2021;45(4):419-454. doi: 10.1007/s10919-021-00375-1. Epub 2021 Jul 24.

Emotional authenticity modulates affective and social trait inferences from voices.情绪真实性调节声音的情感和社会特质推断。

Philos Trans R Soc Lond B Biol Sci. 2021 Dec 20;376(1840):20200402. doi: 10.1098/rstb.2020.0402. Epub 2021 Nov 1.

The link between auditory salience and emotion intensity.听觉显著性与情绪强度之间的联系。

Cogn Emot. 2020 Sep;34(6):1246-1259. doi: 10.1080/02699931.2020.1736992. Epub 2020 Mar 3.

When voices get emotional: a corpus of nonverbal vocalizations for research on emotion processing.当声音变得情绪化：用于情感处理研究的非言语发声语料库。

Behav Res Methods. 2013 Dec;45(4):1234-45. doi: 10.3758/s13428-013-0324-3.

Sound context modulates perceived vocal emotion.合理的语境会调节对声音情感的感知。

Behav Processes. 2020 Mar;172:104042. doi: 10.1016/j.beproc.2020.104042. Epub 2020 Jan 8.

The variably intense vocalizations of affect and emotion (VIVAE) corpus prompts new perspective on nonspeech perception.情感多变的发声（VIVAE）语料库促使人们对非言语感知产生新的看法。

Emotion. 2022 Feb;22(1):213-225. doi: 10.1037/emo0001048.

Good vibrations: A review of vocal expressions of positive emotions.好的振动：积极情绪的声音表达综述。

Psychon Bull Rev. 2020 Apr;27(2):237-265. doi: 10.3758/s13423-019-01701-x.

Perceptual and acoustic differences between authentic and acted nonverbal emotional vocalizations.真实与表演的非言语情感发声之间的感知和声学差异。

Q J Exp Psychol (Hove). 2018 Mar;71(3):622-641. doi: 10.1080/17470218.2016.1270976. Epub 2018 Jan 1.

引用本文的文献

Pitch characteristics of real-world infant-directed speech vary with pragmatic context, perceived adult gender, and infant gender.现实世界中面向婴儿的言语的音高特征会因语用情境、感知到的成人性别和婴儿性别而有所不同。

PLoS One. 2025 Jun 25;20(6):e0326569. doi: 10.1371/journal.pone.0326569. eCollection 2025.

Nonlinear phenomena make animal calls alarming for human listeners.非线性现象使动物叫声令人类听众感到惊恐。

iScience. 2025 May 7;28(6):112600. doi: 10.1016/j.isci.2025.112600. eCollection 2025 Jun 20.

Acoustic estimation of voice roughness.嗓音粗糙度的声学评估。

Atten Percept Psychophys. 2025 Apr 28. doi: 10.3758/s13414-025-03060-3.

Machine Learning Approach to Identifying Empathy Using the Vocals of Mental Health Helpline Counselors: Algorithm Development and Validation.使用心理健康热线咨询师声音识别同理心的机器学习方法：算法开发与验证

JMIR Form Res. 2025 Apr 16;9:e67835. doi: 10.2196/67835.

CoVox: A dataset of contrasting vocalizations.CoVox：一个包含对比发声的数据集。

Behav Res Methods. 2025 Apr 11;57(5):142. doi: 10.3758/s13428-025-02664-9.

How to analyse and manipulate nonlinear phenomena in voice recordings.如何分析和处理语音记录中的非线性现象。

Philos Trans R Soc Lond B Biol Sci. 2025 Apr 3;380(1923):20240003. doi: 10.1098/rstb.2024.0003.

Nonlinear phenomena in mammalian vocal communication: an introduction and scoping review.哺乳动物发声交流中的非线性现象：引言与范围综述

Philos Trans R Soc Lond B Biol Sci. 2025 Apr 3;380(1923):20240017. doi: 10.1098/rstb.2024.0017.

Acoustic context and dynamics of nonlinear phenomena in mammalian calls: the case of puppy whines.哺乳动物叫声中非线性现象的声学背景与动态：以幼犬哀鸣声为例。

Philos Trans R Soc Lond B Biol Sci. 2025 Apr 3;380(1923):20240022. doi: 10.1098/rstb.2024.0022.

Nonlinear acoustic phenomena affect the perception of pain in human baby cries.非线性声学现象会影响人类婴儿哭声中的疼痛感知。

Philos Trans R Soc Lond B Biol Sci. 2025 Apr 3;380(1923):20240023. doi: 10.1098/rstb.2024.0023.

Nonlinear vocal phenomena and speech intelligibility.非线性发声现象与言语清晰度

Philos Trans R Soc Lond B Biol Sci. 2025 Apr 3;380(1923):20240254. doi: 10.1098/rstb.2024.0254.

本文引用的文献

Implicit associations between individual properties of color and sound.颜色与声音的个体属性之间的隐性关联。

Atten Percept Psychophys. 2019 Apr;81(3):764-777. doi: 10.3758/s13414-018-01639-7.

Human Non-linguistic Vocal Repertoire: Call Types and Their Meaning.人类非语言发声库：叫声类型及其含义。

J Nonverbal Behav. 2018;42(1):53-80. doi: 10.1007/s10919-017-0267-y. Epub 2017 Sep 30.

Towards a social functional account of laughter: Acoustic features convey reward, affiliation, and dominance.迈向对笑的社会功能解释：声学特征传达奖励、归属感和支配地位。

PLoS One. 2017 Aug 29;12(8):e0183811. doi: 10.1371/journal.pone.0183811. eCollection 2017.

Acoustic characteristics used by Japanese macaques for individual discrimination.日本猕猴用于个体识别的声学特征。

J Exp Biol. 2017 Oct 1;220(Pt 19):3571-3578. doi: 10.1242/jeb.154765. Epub 2017 Aug 4.

DAVID: An open-source platform for real-time transformation of infra-segmental emotional cues in running speech.DAVID：用于实时转换口语中隐段情绪线索的开源平台。

Behav Res Methods. 2018 Feb;50(1):323-343. doi: 10.3758/s13428-017-0873-y.

Perceptual and acoustic differences between authentic and acted nonverbal emotional vocalizations.真实与表演的非言语情感发声之间的感知和声学差异。

Q J Exp Psychol (Hove). 2018 Mar;71(3):622-641. doi: 10.1080/17470218.2016.1270976. Epub 2018 Jan 1.

Nonlinguistic vocalizations from online amateur videos for emotion research: A validated corpus.用于情感研究的在线业余视频中的非语言发声：一个经过验证的语料库。

Behav Res Methods. 2017 Apr;49(2):758-771. doi: 10.3758/s13428-016-0736-y.

Covert digital manipulation of vocal emotion alter speakers' emotional states in a congruent direction.对声音情感进行隐蔽的数字操纵会使说话者的情绪状态朝着一致的方向改变。

Proc Natl Acad Sci U S A. 2016 Jan 26;113(4):948-53. doi: 10.1073/pnas.1506552113. Epub 2016 Jan 11.

Perceptual evaluation of voice source models.语音源模型的感知评估。

J Acoust Soc Am. 2015 Jul;138(1):1-10. doi: 10.1121/1.4922174.

Single-subject analyses of magnetoencephalographic evoked responses to the acoustic properties of affective non-verbal vocalizations.针对情感性非言语发声的声学特性的脑磁图诱发反应的单受试者分析。

Front Neurosci. 2014 Dec 22;8:422. doi: 10.3389/fnins.2014.00422. eCollection 2014.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Soundgen：一个用于合成非言语发声的开源工具。

Soundgen: An open-source tool for synthesizing nonverbal vocalizations.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献