Wingert Jereme C, Parida Satyabrata, Norman-Haignere Sam, David Stephen V
Behavioral and Systems Neuroscience Graduate Program, Oregon Health and Science University, Portland, OR 97239, USA.
Oregon Hearing Research Center, Oregon Health and Science University, Portland, OR 97239, USA.
bioRxiv. 2024 Nov 8:2024.11.07.622384. doi: 10.1101/2024.11.07.622384.
Auditory cortex encodes information about nonlinear combinations of spectro-temporal sound features. Convolutional neural networks (CNNs) provide an architecture for generalizable encoding models that can predict time-varying neural activity evoked by natural sounds with substantially greater accuracy than established models. However, the complexity of CNNs makes it difficult to discern the computational properties that support their improved performance. To address this limitation, we developed a method to visualize the tuning subspace captured by a CNN. Single-unit data was recorded using high channel-count microelectrode arrays from primary auditory cortex (A1) of awake, passively listening ferrets during presentation of a large natural sound set. A CNN was fit to the data, replicating approaches from previous work. To measure the tuning subspace, the dynamic spectrotemporal receptive field (dSTRF) was measured as the locally linear filter approximating the input-output relationship of the CNN at each stimulus timepoint. Principal component analysis was then used to reduce this very large set of filters to a smaller subspace, typically requiring 2-10 filters to account for 90% of dSTRF variance. The stimulus was projected into the subspace for each neuron, and a new model was fit using only the projected values. The subspace model was able to predict time-varying spike rate nearly as accurately as the full CNN. Sensory responses could be plotted in the subspace, providing a compact model visualization. This analysis revealed a diversity of nonlinear responses, consistent with contrast gain control and emergent invariance to spectrotemporal modulation phase. Within local populations, neurons formed a sparse representation by tiling the tuning subspace. Narrow spiking, putative inhibitory neurons showed distinct patterns of tuning that may reflect their position in the cortical circuit. These results demonstrate a conceptual link between CNN and subspace models and establish a framework for interpretation of deep learning-based models.
听觉皮层对频谱-时间声音特征的非线性组合信息进行编码。卷积神经网络(CNNs)为可推广的编码模型提供了一种架构,该模型能够以比现有模型更高的精度预测自然声音诱发的随时间变化的神经活动。然而,卷积神经网络的复杂性使得难以辨别支持其性能提升的计算特性。为了解决这一局限性,我们开发了一种方法来可视化卷积神经网络捕获的调谐子空间。在呈现大量自然声音集时,使用高通道数微电极阵列从清醒、被动聆听的雪貂的初级听觉皮层(A1)记录单神经元数据。一个卷积神经网络被拟合到数据上,复制了先前工作中的方法。为了测量调谐子空间,动态频谱-时间感受野(dSTRF)被测量为在每个刺激时间点近似卷积神经网络输入-输出关系的局部线性滤波器。然后使用主成分分析将这一非常大的滤波器集减少到一个较小的子空间,通常需要2 - 10个滤波器来解释dSTRF方差的90%。将刺激投影到每个神经元的子空间中,并仅使用投影值拟合一个新模型。子空间模型能够几乎与完整的卷积神经网络一样准确地预测随时间变化的放电率。感觉反应可以绘制在子空间中,提供一个紧凑的模型可视化。该分析揭示了多种非线性反应,与对比度增益控制和频谱-时间调制相位的涌现不变性一致。在局部群体中,神经元通过平铺调谐子空间形成稀疏表示。窄峰放电的假定抑制性神经元表现出独特的调谐模式,这可能反映了它们在皮层回路中的位置。这些结果证明了卷积神经网络和子空间模型之间的概念联系,并建立了一个基于深度学习模型解释的框架。