Independent Researcher, Paris, France.
Science and Technology of Music and Sound (UMR9912, IRCAM/CNRS/Sorbonne Université), Paris, France.
PLoS One. 2019 Apr 4;14(4):e0205943. doi: 10.1371/journal.pone.0205943. eCollection 2019.
Over the past few years, the field of visual social cognition and face processing has been dramatically impacted by a series of data-driven studies employing computer-graphics tools to synthesize arbitrary meaningful facial expressions. In the auditory modality, reverse correlation is traditionally used to characterize sensory processing at the level of spectral or spectro-temporal stimulus properties, but not higher-level cognitive processing of e.g. words, sentences or music, by lack of tools able to manipulate the stimulus dimensions that are relevant for these processes. Here, we present an open-source audio-transformation toolbox, called CLEESE, able to systematically randomize the prosody/melody of existing speech and music recordings. CLEESE works by cutting recordings in small successive time segments (e.g. every successive 100 milliseconds in a spoken utterance), and applying a random parametric transformation of each segment's pitch, duration or amplitude, using a new Python-language implementation of the phase-vocoder digital audio technique. We present here two applications of the tool to generate stimuli for studying intonation processing of interrogative vs declarative speech, and rhythm processing of sung melodies.
在过去的几年中,一系列采用计算机图形工具来合成任意有意义的面部表情的基于数据的研究,极大地推动了视觉社会认知和面部处理领域的发展。在听觉模态中,传统上使用反向相关来描述在谱或谱时刺激属性层面的感觉处理,但由于缺乏能够操纵与这些过程相关的刺激维度的工具,因此无法对例如单词、句子或音乐等更高层次的认知处理进行描述。在这里,我们介绍了一个名为 CLEESE 的开源音频转换工具箱,它能够系统地随机化现有语音和音乐录音的韵律/旋律。CLEESE 的工作原理是将录音切割成小的连续时间片段(例如,在一个口语表达中每连续 100 毫秒),并使用相位声码器数字音频技术的新 Python 语言实现,对每个片段的音高、持续时间或幅度进行随机参数变换。我们在这里介绍了该工具的两个应用,用于生成用于研究疑问句与陈述句的语调处理以及歌唱旋律的节奏处理的刺激。