CerCo, CNRS UMR5549, Toulouse, France.
Universite de Toulouse, Toulouse, France.
Sci Rep. 2023 Sep 20;13(1):15666. doi: 10.1038/s41598-023-42891-8.
In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called "Brain-Diffuser". In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compelling "ROI-optimal" scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain-computer interface) and fundamental neuroscience.
在神经解码研究中,最有趣的话题之一是基于 fMRI 信号重建感知自然图像。以前的研究已经成功地重建了视觉的不同方面,例如低水平属性(形状、纹理、布局)或高水平特征(物体类别、场景描述语义),但通常未能一起重建复杂场景图像的这些属性。生成式人工智能最近通过能够生成高复杂度图像的潜在扩散模型取得了飞跃。在这里,我们研究如何利用这项创新技术进行大脑解码。我们提出了一种称为“Brain-Diffuser”的两阶段场景重建框架。在第一阶段,从 fMRI 信号开始,我们使用 VDVAE(非常深的变分自动编码器)模型重建捕获低水平属性和整体布局的图像。在第二阶段,我们使用条件为预测的多模态(文本和视觉)特征的潜在扩散模型(通用扩散)的图像到图像框架,生成最终的重建图像。在公开可用的自然场景数据集基准上,我们的方法在定性和定量方面都优于以前的模型。当应用于从单个 ROI(感兴趣区域)掩模生成的合成 fMRI 模式时,我们训练的模型创建了引人入胜的“ROI-optimal”场景,与神经科学知识一致。因此,所提出的方法学可以对应用(例如脑机接口)和基础神经科学都产生影响。