Assaneo M Florencia, Sitt Jacobo, Varoquaux Gael, Sigman Mariano, Cohen Laurent, Trevisan Marcos A
Department of Physics, University of Buenos Aires-IFIBA CONICET, Ciudad Universitaria, Pab. 1, 1428EGA, Buenos Aires, Argentina; Department of Psychology, New York University, New York, NY 10003, USA.
INSERM, Cognitive Neuroimaging Unit, Gif sur Yvette, France; Commisariat à l'Energie Atomique, Direction des Sciences du Vivant, I2BM, NeuroSpin Center, Gif sur Yvette, France; INSERM U1127, Institut du Cerveau et de la Moelle Épinière, Paris, France; CNRS UMR 7225, Institut du Cerveau et de la Moelle Épinière, Paris, France; Sorbonne Universités, UPMC Univ Paris 06, Paris, France.
Neuroimage. 2016 Nov 1;141:31-39. doi: 10.1016/j.neuroimage.2016.07.033. Epub 2016 Jul 17.
The faculty of language depends on the interplay between the production and perception of speech sounds. A relevant open question is whether the dimensions that organize voice perception in the brain are acoustical or depend on properties of the vocal system that produced it. One of the main empirical difficulties in answering this question is to generate sounds that vary along a continuum according to the anatomical properties the vocal apparatus that produced them. Here we use a mathematical model that offers the unique possibility of synthesizing vocal sounds by controlling a small set of anatomically based parameters. In a first stage the quality of the synthetic voice was evaluated. Using specific time traces for sub-glottal pressure and tension of the vocal folds, the synthetic voices generated perceptual responses, which are indistinguishable from those of real speech. The synthesizer was then used to investigate how the auditory cortex responds to the perception of voice depending on the anatomy of the vocal apparatus. Our fMRI results show that sounds are perceived as human vocalizations when produced by a vocal system that follows a simple relationship between the size of the vocal folds and the vocal tract. We found that these anatomical parameters encode the perceptual vocal identity (male, female, child) and show that the brain areas that respond to human speech also encode vocal identity. On the basis of these results, we propose that this low-dimensional model of the vocal system is capable of generating realistic voices and represents a novel tool to explore the voice perception with a precise control of the anatomical variables that generate speech. Furthermore, the model provides an explanation of how auditory cortices encode voices in terms of the anatomical parameters of the vocal system.
语言能力取决于语音的产生与感知之间的相互作用。一个相关的开放性问题是,大脑中组织语音感知的维度是声学性的,还是取决于产生语音的发声系统的特性。回答这个问题的一个主要实证困难在于,要根据产生语音的发声器官的解剖学特性,生成在一个连续体上变化的声音。在这里,我们使用一个数学模型,它提供了通过控制一小套基于解剖学的参数来合成语音的独特可能性。在第一阶段,对合成语音的质量进行了评估。利用声门下压力和声带张力的特定时间轨迹,合成语音产生了感知反应,这些反应与真实语音的反应难以区分。然后,该合成器被用于研究听觉皮层如何根据发声器官的解剖结构对语音感知做出反应。我们的功能磁共振成像结果表明,当由一个在声带大小和声道之间遵循简单关系的发声系统产生声音时,这些声音会被感知为人的发声。我们发现,这些解剖学参数编码了感知到的语音身份(男性、女性、儿童),并表明对人类语音做出反应的脑区也编码了语音身份。基于这些结果,我们提出,这种发声系统的低维模型能够生成逼真的语音,并且是一种探索语音感知的新工具,能够精确控制产生语音的解剖学变量。此外,该模型解释了听觉皮层如何根据发声系统的解剖学参数对语音进行编码。