Rupp Kyle M, Hect Jasmine L, Harford Emily E, Holt Lori L, Ghuman Avniel Singh, Abel Taylor J
Department of Neurological Surgery, University of Pittsburgh, PA 15213.
Department of Psychology, The University of Texas at Austin, TX 78712.
Proc Natl Acad Sci U S A. 2025 May 6;122(18):e2412243122. doi: 10.1073/pnas.2412243122. Epub 2025 Apr 28.
Efficient behavior is supported by humans' ability to rapidly recognize acoustically distinct sounds as members of a common category. Within the auditory cortex, critical unanswered questions remain regarding the organization and dynamics of sound categorization. We performed intracerebral recordings during epilepsy surgery evaluation as 20 patient-participants listened to natural sounds. We then built encoding models to predict neural responses using sound representations extracted from different layers within a deep neural network (DNN) pretrained to categorize sounds from acoustics. This approach yielded accurate models of neural responses throughout the auditory cortex. The complexity of a cortical site's representation (measured by the depth of the DNN layer that produced the best model) was closely related to its anatomical location, with shallow, middle, and deep layers associated with core (primary auditory cortex), lateral belt, and parabelt regions, respectively. Smoothly varying gradients of representational complexity existed within these regions, with complexity increasing along a posteromedial-to-anterolateral direction in core and lateral belt and along posterior-to-anterior and dorsal-to-ventral dimensions in parabelt. We then characterized the time (relative to sound onset) when feature representations emerged; this measure of temporal dynamics increased across the auditory hierarchy. Finally, we found separable effects of region and temporal dynamics on representational complexity: sites that took longer to begin encoding stimulus features had higher representational complexity independent of region, and downstream regions encoded more complex features independent of temporal dynamics. These findings suggest that hierarchies of timescales and complexity represent a functional organizational principle of the auditory stream underlying our ability to rapidly categorize sounds.
人类能够迅速将声学上不同的声音识别为同一类别中的成员,这支持了高效行为。在听觉皮层内,关于声音分类的组织和动态仍存在关键的未解决问题。在癫痫手术评估期间,我们对20名患者参与者进行了脑内记录,他们聆听自然声音。然后,我们构建了编码模型,使用从预训练用于根据声学对声音进行分类的深度神经网络(DNN)的不同层中提取的声音表示来预测神经反应。这种方法产生了整个听觉皮层神经反应的准确模型。皮层位点表示的复杂性(通过产生最佳模型的DNN层深度来衡量)与其解剖位置密切相关,浅层、中层和深层分别与核心(初级听觉皮层)、外侧带和旁带区域相关。在这些区域内存在表示复杂性的平滑变化梯度,在核心和外侧带中,复杂性沿后内侧到前外侧方向增加,在旁带中沿后到前和背到腹维度增加。然后,我们确定了特征表示出现的时间(相对于声音开始);这种时间动态测量在整个听觉层次结构中增加。最后,我们发现区域和时间动态对表示复杂性有可分离的影响:开始编码刺激特征所需时间更长的位点具有更高的表示复杂性,与区域无关,并且下游区域编码更复杂的特征,与时间动态无关。这些发现表明,时间尺度和复杂性层次结构代表了听觉流的功能组织原则,是我们快速对声音进行分类能力的基础。