克利塞：用于语音和音乐认知数据驱动实验的开源音频转换工具包。

CLEESE: An open-source audio-transformation toolbox for data-driven experiments in speech and music cognition.

机构信息

Independent Researcher, Paris, France.

Science and Technology of Music and Sound (UMR9912, IRCAM/CNRS/Sorbonne Université), Paris, France.

出版信息

PLoS One. 2019 Apr 4;14(4):e0205943. doi: 10.1371/journal.pone.0205943. eCollection 2019.

DOI:10.1371/journal.pone.0205943

PMID:30947281

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6448843/

Abstract

Over the past few years, the field of visual social cognition and face processing has been dramatically impacted by a series of data-driven studies employing computer-graphics tools to synthesize arbitrary meaningful facial expressions. In the auditory modality, reverse correlation is traditionally used to characterize sensory processing at the level of spectral or spectro-temporal stimulus properties, but not higher-level cognitive processing of e.g. words, sentences or music, by lack of tools able to manipulate the stimulus dimensions that are relevant for these processes. Here, we present an open-source audio-transformation toolbox, called CLEESE, able to systematically randomize the prosody/melody of existing speech and music recordings. CLEESE works by cutting recordings in small successive time segments (e.g. every successive 100 milliseconds in a spoken utterance), and applying a random parametric transformation of each segment's pitch, duration or amplitude, using a new Python-language implementation of the phase-vocoder digital audio technique. We present here two applications of the tool to generate stimuli for studying intonation processing of interrogative vs declarative speech, and rhythm processing of sung melodies.

摘要

在过去的几年中，一系列采用计算机图形工具来合成任意有意义的面部表情的基于数据的研究，极大地推动了视觉社会认知和面部处理领域的发展。在听觉模态中，传统上使用反向相关来描述在谱或谱时刺激属性层面的感觉处理，但由于缺乏能够操纵与这些过程相关的刺激维度的工具，因此无法对例如单词、句子或音乐等更高层次的认知处理进行描述。在这里，我们介绍了一个名为 CLEESE 的开源音频转换工具箱，它能够系统地随机化现有语音和音乐录音的韵律/旋律。CLEESE 的工作原理是将录音切割成小的连续时间片段（例如，在一个口语表达中每连续 100 毫秒），并使用相位声码器数字音频技术的新 Python 语言实现，对每个片段的音高、持续时间或幅度进行随机参数变换。我们在这里介绍了该工具的两个应用，用于生成用于研究疑问句与陈述句的语调处理以及歌唱旋律的节奏处理的刺激。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91d2/6448843/6a6f4028c268/pone.0205943.g001.jpg

相似文献

CLEESE: An open-source audio-transformation toolbox for data-driven experiments in speech and music cognition.克利塞：用于语音和音乐认知数据驱动实验的开源音频转换工具包。

PLoS One. 2019 Apr 4;14(4):e0205943. doi: 10.1371/journal.pone.0205943. eCollection 2019.

Individual differences in the perception of melodic contours and pitch-accent timing in speech: Support for domain-generality of pitch processing.个体在感知旋律轮廓和言语音高重音时间方面的差异：支持音高加工的领域普遍性。

J Exp Psychol Gen. 2015 Aug;144(4):730-6. doi: 10.1037/xge0000081.

Bridging music and speech rhythm: rhythmic priming and audio-motor training affect speech perception.架起音乐与言语节奏的桥梁：节奏启动和听觉-运动训练影响言语感知。

Acta Psychol (Amst). 2015 Feb;155:43-50. doi: 10.1016/j.actpsy.2014.12.002. Epub 2014 Dec 30.

Musical melody and speech intonation: singing a different tune.音乐旋律和言语语调：唱着不同的曲调。

PLoS Biol. 2012;10(7):e1001372. doi: 10.1371/journal.pbio.1001372. Epub 2012 Jul 31.

Finding the music of speech: Musical knowledge influences pitch processing in speech.探寻言语中的音乐性：音乐知识影响言语中的音高处理。

Cognition. 2015 Oct;143:135-40. doi: 10.1016/j.cognition.2015.06.015. Epub 2015 Jul 4.

Benefits of Music Training for Perception of Emotional Speech Prosody in Deaf Children With Cochlear Implants.音乐训练对植入人工耳蜗的聋童感知情感言语韵律的益处。

Ear Hear. 2017 Jul/Aug;38(4):455-464. doi: 10.1097/AUD.0000000000000402.

Audiovisual synchrony perception for music, speech, and object actions.音乐、语音和物体动作的视听同步感知。

Brain Res. 2006 Sep 21;1111(1):134-42. doi: 10.1016/j.brainres.2006.05.078. Epub 2006 Jul 31.

Speech-specific auditory processing: where is it?特定于言语的听觉处理：它在哪里？

Trends Cogn Sci. 2005 Jun;9(6):271-6. doi: 10.1016/j.tics.2005.03.009.

Comparing the rhythm and melody of speech and music: the case of British English and French.比较语音和音乐的节奏与旋律：以英式英语和法语为例。

J Acoust Soc Am. 2006 May;119(5 Pt 1):3034-47. doi: 10.1121/1.2179657.

Influence of musical expertise and musical training on pitch processing in music and language.音乐专业技能和音乐训练对音乐与语言中音调处理的影响。

Restor Neurol Neurosci. 2007;25(3-4):399-410.

引用本文的文献

How to analyse and manipulate nonlinear phenomena in voice recordings.如何分析和处理语音记录中的非线性现象。

Philos Trans R Soc Lond B Biol Sci. 2025 Apr 3;380(1923):20240003. doi: 10.1098/rstb.2024.0003.

A simple psychophysical procedure separates representational and noise components in impairments of speech prosody perception after right-hemisphere stroke.一种简单的心理物理程序可以分离右半球卒中后言语韵律感知障碍中的表象和噪声成分。

Sci Rep. 2024 Jul 2;14(1):15194. doi: 10.1038/s41598-024-64295-y.

Combining GAN with reverse correlation to construct personalized facial expressions.结合生成对抗网络和反向相关技术构建个性化面部表情。

PLoS One. 2023 Aug 25;18(8):e0290612. doi: 10.1371/journal.pone.0290612. eCollection 2023.

Studying the Developing Brain in Real-World Contexts: Moving From Castles in the Air to Castles on the Ground.在现实环境中研究发育中的大脑：从空中楼阁到脚踏实地。

Front Integr Neurosci. 2022 Jul 13;16:896919. doi: 10.3389/fnint.2022.896919. eCollection 2022.

Mental representations of speech and musical pitch contours reveal a diversity of profiles in autism spectrum disorder.言语和音乐音高轮廓的心理表象在自闭症谱系障碍中揭示了多样性的特征。

Autism. 2023 Apr;27(3):629-646. doi: 10.1177/13623613221111207. Epub 2022 Jul 18.

Listeners' perceptions of the certainty and honesty of a speaker are associated with a common prosodic signature.听众对说话者的确定性和诚实性的感知与一种常见的韵律特征有关。

Nat Commun. 2021 Feb 8;12(1):861. doi: 10.1038/s41467-020-20649-4.

Revealing the information contents of memory within the stimulus information representation framework.在刺激信息表示框架内揭示记忆的信息内容。

Philos Trans R Soc Lond B Biol Sci. 2020 May 25;375(1799):20190705. doi: 10.1098/rstb.2019.0705. Epub 2020 Apr 6.

本文引用的文献

Cracking the social code of speech prosody using reverse correlation.使用反向关联破解言语韵律的社会代码。

Proc Natl Acad Sci U S A. 2018 Apr 10;115(15):3972-3977. doi: 10.1073/pnas.1716090115. Epub 2018 Mar 26.

Uncovering mental representations of smiled speech using reverse correlation.利用反向关联揭示微笑言语的心理表象。

J Acoust Soc Am. 2018 Jan;143(1):EL19. doi: 10.1121/1.5020989.

Intonational speech prosody encoding in the human auditory cortex.人类听觉皮层中的语调语音韵律编码。

Science. 2017 Aug 25;357(6353):797-801. doi: 10.1126/science.aam8577.

Functional Smiles: Tools for Love, Sympathy, and War.功能性微笑：爱、同情和战争的工具。

Psychol Sci. 2017 Sep;28(9):1259-1270. doi: 10.1177/0956797617706082. Epub 2017 Jul 25.

Perceptually Salient Regions of the Modulation Power Spectrum for Musical Instrument Identification.用于乐器识别的调制功率谱的感知显著区域

Front Psychol. 2017 Apr 13;8:587. doi: 10.3389/fpsyg.2017.00587. eCollection 2017.

Measuring time-frequency importance functions of speech with bubble noise.用气泡噪声测量语音的时频重要性函数。

J Acoust Soc Am. 2016 Oct;140(4):2542. doi: 10.1121/1.4964102.

Auditory "bubbles": Efficient classification of the spectrotemporal modulations essential for speech intelligibility.听觉“气泡”：对语音清晰度至关重要的频谱时间调制的高效分类。

J Acoust Soc Am. 2016 Aug;140(2):1072. doi: 10.1121/1.4960544.

Listeners lengthen phrase boundaries in self-paced music.在自定节奏的音乐中，听众会延长乐句界限。

J Exp Psychol Hum Percept Perform. 2016 Oct;42(10):1676-86. doi: 10.1037/xhp0000245. Epub 2016 Jul 4.

Data-driven approaches in the investigation of social perception.社会认知研究中的数据驱动方法。

Philos Trans R Soc Lond B Biol Sci. 2016 May 5;371(1693). doi: 10.1098/rstb.2015.0367.

Cortical entrainment to music and its modulation by expertise.大脑皮层对音乐的同步以及专业技能对其的调节作用。

Proc Natl Acad Sci U S A. 2015 Nov 10;112(45):E6233-42. doi: 10.1073/pnas.1508431112. Epub 2015 Oct 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

克利塞：用于语音和音乐认知数据驱动实验的开源音频转换工具包。

CLEESE: An open-source audio-transformation toolbox for data-driven experiments in speech and music cognition.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献