School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
Media Lab, Massachusetts Institute of Technology, Cambridge, USA.
Sci Rep. 2024 Jul 16;14(1):16436. doi: 10.1038/s41598-024-66228-1.
Recent advances in visual decoding have enabled the classification and reconstruction of perceived images from the brain. However, previous approaches have predominantly relied on stationary, costly equipment like fMRI or high-density EEG, limiting the real-world availability and applicability of such projects. Additionally, several EEG-based paradigms have utilized artifactual, rather than stimulus-related information yielding flawed classification and reconstruction results. Our goal was to reduce the cost of the decoding paradigm, while increasing its flexibility. Therefore, we investigated whether the classification of an image category and the reconstruction of the image itself is possible from the visually evoked brain activity measured by a portable, 8-channel EEG. To compensate for the low electrode count and to avoid flawed predictions, we designed a theory-guided EEG setup and created a new experiment to obtain a dataset from 9 subjects. We compared five contemporary classification models with our setup reaching an average accuracy of 34.4% for 20 image classes on hold-out test recordings. For the reconstruction, the top-performing model was used as an EEG-encoder which was combined with a pretrained latent diffusion model via double-conditioning. After fine-tuning, we reconstructed images from the test set with a 1000 trial 50-class top-1 accuracy of 35.3%. While not reaching the same performance as MRI-based paradigms on unseen stimuli, our approach greatly improved the affordability and mobility of the visual decoding technology.
最近,视觉解码技术取得了进展,可以根据大脑中的感知图像进行分类和重建。然而,以前的方法主要依赖于像 fMRI 或高密度 EEG 这样的固定、昂贵的设备,限制了此类项目在现实世界中的可用性和适用性。此外,一些基于 EEG 的范式利用了人为的、而不是与刺激相关的信息,从而导致分类和重建结果存在缺陷。我们的目标是降低解码范式的成本,同时提高其灵活性。因此,我们研究了是否可以通过便携式 8 通道 EEG 测量的视觉诱发脑活动来对图像类别进行分类并重建图像本身。为了弥补电极数量的不足并避免错误的预测,我们设计了一种基于理论的 EEG 设置,并创建了一个新的实验来从 9 名受试者那里获得数据集。我们将五种现代分类模型与我们的设置进行了比较,在保留测试记录上,我们的设置平均达到了 20 个图像类别的 34.4%的准确率。对于重建,使用表现最佳的模型作为 EEG 编码器,并通过双重条件将其与预训练的潜在扩散模型相结合。经过微调后,我们使用测试集进行图像重建,在 1000 次试验中 50 个类别的准确率达到了 35.3%。虽然我们的方法在看不见的刺激上没有达到基于 MRI 的范式的相同性能,但它大大提高了视觉解码技术的经济实惠性和移动性。