IEEE Trans Neural Syst Rehabil Eng. 2024;32:1267-1283. doi: 10.1109/TNSRE.2024.3377698. Epub 2024 Mar 22.
The utilization of deep learning techniques for decoding visual perception images from brain activity recorded by functional magnetic resonance imaging (fMRI) has garnered considerable attention in recent research. However, reconstructed images from previous studies still suffer from low quality or unreliability. Moreover, the complexity inherent to fMRI data, characterized by high dimensionality and low signal-to-noise ratio, poses significant challenges in extracting meaningful visual information for perceptual reconstruction. In this regard, we proposes a novel neural decoding model, named the hierarchical semantic generative adversarial network (HS-GAN), inspired by the hierarchical encoding of the visual cortex and the homology theory of convolutional neural networks (CNNs), which is capable of reconstructing perceptual images from fMRI data by leveraging the hierarchical and semantic representations. The experimental results demonstrate that HS-GAN achieved the best performance on Horikawa2017 dataset (histogram similarity: 0.447, SSIM-Acc: 78.9%, Peceptual-Acc: 95.38%, AlexNet(2): 96.24% and AlexNet(5): 94.82%) over existing advanced methods, indicating improved naturalness and fidelity of the reconstructed image. The versatility of the HS-GAN was also highlighted, as it demonstrated promising generalization capabilities in reconstructing handwritten digits, achieving the highest SSIM (0.783±0.038), thus extending its application beyond training solely on natural images.
深度学习技术在利用功能磁共振成像(fMRI)记录的大脑活动来解码视觉感知图像方面受到了近期研究的关注。然而,以前研究中的重构图像仍然存在质量低或不可靠的问题。此外,fMRI 数据的复杂性,具有高维性和低信噪比,在提取用于感知重建的有意义的视觉信息方面带来了重大挑战。在这方面,我们提出了一种新的神经解码模型,称为分层语义生成对抗网络(HS-GAN),该模型受视觉皮层的分层编码和卷积神经网络(CNN)的同源理论启发,能够通过利用分层和语义表示从 fMRI 数据中重建感知图像。实验结果表明,HS-GAN 在 Horikawa2017 数据集上的性能最好(直方图相似度:0.447,SSIM-Acc:78.9%,感知准确度:95.38%,AlexNet(2):96.24%和 AlexNet(5):94.82%),优于现有的先进方法,表明重构图像的自然度和逼真度得到了提高。HS-GAN 的通用性也得到了强调,因为它在手写数字的重构中表现出了有希望的泛化能力,实现了最高的 SSIM(0.783±0.038),从而将其应用扩展到仅在自然图像上进行训练之外。