Park Jong-Yun, Tsukamoto Mitsuaki, Tanaka Misato, Kamitani Yukiyasu
Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Japan.
Department of Neuroinformatics, ATR Computational Neuroscience Laboratories, Kyoto, Japan.
PLoS Biol. 2025 Jul 23;23(7):e3003293. doi: 10.1371/journal.pbio.3003293. eCollection 2025 Jul.
Reconstruction of perceptual experiences from brain activity offers a unique window into how population neural responses represent sensory information. Although decoding visual content from functional MRI (fMRI) has seen significant success, reconstructing arbitrary sounds remains challenging due to the fine temporal structure of auditory signals and the coarse temporal resolution of fMRI. Drawing on the hierarchical auditory features of deep neural networks (DNNs) with progressively larger time windows and their neural activity correspondence, we introduce a method for sound reconstruction that integrates brain decoding of DNN features and an audio-generative model. DNN features decoded from auditory cortical activity outperformed spectrotemporal and modulation-based features, enabling perceptually plausible reconstructions across diverse sound categories. Behavioral evaluations and objective measures confirmed that these reconstructions preserved short-term spectral and perceptual properties, capturing the characteristic timbre of speech, animal calls, and musical instruments, while the reconstructed sounds did not reproduce longer temporal sequences with fidelity. Leave-category-out analyses indicated that the method generalizes across sound categories. Reconstructions at higher DNN layers and from early auditory regions revealed distinct contributions to decoding performance. Applying the model to a selective auditory attention ("cocktail party") task further showed that reconstructions reflected the attended sound more strongly than the unattended one in some of the subjects. Despite its inability to reconstruct exact temporal sequences, which may reflect the limited temporal resolution of fMRI, our framework demonstrates the feasibility of mapping brain activity to auditory experiences-a step toward more comprehensive understanding and reconstruction of internal auditory representations.
从大脑活动重建感知体验为了解群体神经反应如何表征感官信息提供了一个独特的窗口。尽管从功能磁共振成像(fMRI)中解码视觉内容已取得显著成功,但由于听觉信号精细的时间结构和fMRI粗糙的时间分辨率,重建任意声音仍然具有挑战性。利用具有逐渐增大时间窗口的深度神经网络(DNN)的分层听觉特征及其神经活动对应关系,我们引入了一种声音重建方法,该方法整合了DNN特征的脑解码和音频生成模型。从听觉皮层活动中解码出的DNN特征优于基于频谱时间和调制的特征,能够对不同声音类别进行在感知上合理的重建。行为评估和客观测量证实,这些重建保留了短期频谱和感知特性,捕捉到了语音、动物叫声和乐器的特征音色,而重建声音并没有忠实地再现更长的时间序列。留类别分析表明该方法在不同声音类别中具有通用性。在更高DNN层和早期听觉区域的重建揭示了对解码性能的不同贡献。将该模型应用于选择性听觉注意力(“鸡尾酒会”)任务进一步表明,在一些受试者中,重建对被关注声音的反映比对未被关注声音的反映更强。尽管该方法无法重建精确的时间序列,这可能反映了fMRI有限的时间分辨率,但我们的框架证明了将大脑活动映射到听觉体验的可行性——这是朝着更全面理解和重建内部听觉表征迈出的一步。