Vaziri Parisa A, McDougle Samuel D, Clark Damon A
Yale College, Yale University, New Haven, CT 06511.
Dept of Psychology, Yale University, New Haven, CT 06511.
bioRxiv. 2024 Nov 21:2024.08.03.606481. doi: 10.1101/2024.08.03.606481.
To discern speech or appreciate music, the human auditory system detects how pitch increases or decreases over time. However, the algorithms used to detect changes in pitch, or pitch motion, are incompletely understood. Here, using psychophysics, computational modeling, functional neuroimaging, and analysis of recorded speech, we ask if humans can detect pitch motion using computations analogous to those used by the visual system. We adapted stimuli from studies of vision to create novel auditory correlated noise stimuli that elicited robust pitch motion percepts. Crucially, these stimuli are inharmonic and possess no persistent features across frequency or time, but do possess positive or negative local spectrotemporal correlations in intensity. In psychophysical experiments, we found clear evidence that humans can judge pitch direction based only on positive or negative spectrotemporal intensity correlations. The key behavioral result-robust sensitivity to the negative spectrotemporal correlations-is a direct analogue of illusory "reverse-phi" motion in vision, and thus constitutes a new auditory illusion. Our behavioral results and computational modeling led us to hypothesize that human auditory processing may employ pitch direction opponency. fMRI measurements in auditory cortex supported this hypothesis. To link our psychophysical findings to real-world pitch perception, we analyzed recordings of English and Mandarin speech and found that pitch direction was robustly signaled by both positive and negative spectrotemporal correlations, suggesting that sensitivity to both types of correlations confers ecological benefits. Overall, this work reveals how motion detection algorithms sensitive to local correlations are deployed by the central nervous system across disparate modalities (vision and audition) and dimensions (space and frequency).
为了辨别语音或欣赏音乐,人类听觉系统会检测音高如何随时间增加或降低。然而,用于检测音高变化或音高运动的算法尚未被完全理解。在这里,我们运用心理物理学、计算建模、功能神经成像以及对录制语音的分析,探究人类是否能够使用类似于视觉系统所使用的计算方法来检测音高运动。我们改编了视觉研究中的刺激,以创建新颖的听觉相关噪声刺激,从而引发强烈的音高运动感知。至关重要的是,这些刺激是不和谐的,在频率或时间上没有持久特征,但在强度上确实具有正或负的局部频谱时间相关性。在心理物理学实验中,我们发现了明确的证据,表明人类仅基于正或负的频谱时间强度相关性就能判断音高方向。关键的行为结果——对负频谱时间相关性的强烈敏感性——是视觉中虚幻的“反向φ”运动的直接类似物,因此构成了一种新的听觉错觉。我们的行为结果和计算建模使我们假设人类听觉处理可能采用音高方向对立性。听觉皮层的功能磁共振成像测量支持了这一假设。为了将我们的心理物理学发现与现实世界中的音高感知联系起来,我们分析了英语和汉语语音的录音,发现音高方向通过正和负的频谱时间相关性都能得到有力的信号传递,这表明对这两种类型相关性的敏感性具有生态益处。总体而言,这项工作揭示了对局部相关性敏感的运动检测算法是如何被中枢神经系统在不同的模态(视觉和听觉)和维度(空间和频率)中部署的。