Kneeland Reese, Ojeda Jordyn, St-Yves Ghislain, Naselaris Thomas
Department of Computer Science, University of Minnesota, Minneapolis MN, 55455.
Department of Neuroscience, University of Minnesota, Minneapolis MN, 55455.
ArXiv. 2023 Jun 1:arXiv:2306.00927v1.
Two recent developments have accelerated progress in image reconstruction from human brain activity: large datasets that offer samples of brain activity in response to many thousands of natural scenes, and the open-sourcing of powerful stochastic image-generators that accept both low- and high-level guidance. Most work in this space has focused on obtaining point estimates of the target image, with the ultimate goal of approximating literal pixel-wise reconstructions of target images from the brain activity patterns they evoke. This emphasis belies the fact that there is always a family of images that are equally compatible with any evoked brain activity pattern, and the fact that many image-generators are inherently stochastic and do not by themselves offer a method for selecting the single best reconstruction from among the samples they generate. We introduce a novel reconstruction procedure (Second Sight) that iteratively refines an image distribution to explicitly maximize the alignment between the predictions of a voxel-wise encoding model and the brain activity patterns evoked by any target image. We use an ensemble of brain-optimized deep neural networks trained on the Natural Scenes Dataset (NSD) as our encoding model, and a latent diffusion model as our image generator. At each iteration, we generate a small library of images and select those that best approximate the measured brain activity when passed through our encoding model. We extract semantic and structural guidance from the selected images, used for generating the next library. We show that this process converges on a distribution of high-quality reconstructions by refining both semantic content and low-level image details across iterations. Images sampled from these converged image distributions are competitive with state-of-the-art reconstruction algorithms. Interestingly, the time-to-convergence varies systematically across visual cortex, with earlier visual areas generally taking longer and converging on narrower image distributions, relative to higher-level brain areas. Second Sight thus offers a succinct and novel method for exploring the diversity of representations across visual brain areas.
一是提供了响应数千个自然场景的大脑活动样本的大型数据集,二是强大的随机图像生成器的开源,这些生成器可接受低级和高级指导。该领域的大多数工作都集中在获得目标图像的点估计上,其最终目标是根据大脑活动模式唤起的目标图像进行逐像素的文字重建。这种强调掩盖了这样一个事实,即总是存在一系列与任何唤起的大脑活动模式同样兼容的图像,而且许多图像生成器本质上是随机的,它们本身并没有提供一种从它们生成的样本中选择单一最佳重建的方法。我们引入了一种新颖的重建程序(Second Sight),该程序迭代地优化图像分布,以明确最大化体素编码模型的预测与任何目标图像唤起的大脑活动模式之间的对齐。我们使用在自然场景数据集(NSD)上训练的一组经过大脑优化的深度神经网络作为我们的编码模型,并使用潜在扩散模型作为我们的图像生成器。在每次迭代中,我们生成一个小的图像库,并选择那些在通过我们的编码模型时最接近测量到的大脑活动的图像。我们从选定的图像中提取语义和结构指导,用于生成下一个图像库。我们表明,通过在迭代过程中优化语义内容和低级图像细节,这个过程会收敛到高质量重建的分布上。从这些收敛的图像分布中采样的图像与最先进的重建算法具有竞争力。有趣的是,收敛时间在视觉皮层中系统地变化,相对于高级脑区,早期视觉区域通常需要更长时间并收敛到更窄的图像分布上。因此,Second Sight提供了一种简洁新颖的方法来探索视觉脑区表征的多样性。