van Vliet Marijn, Rinkinen Oona, Shimizu Takao, Niskanen Anni-Mari, Devereux Barry, Salmelin Riitta
Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.
School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, United Kingdom.
Elife. 2025 May 13;13:RP96217. doi: 10.7554/eLife.96217.
Traditional models of reading lack a realistic simulation of the early visual processing stages, taking input in the form of letter banks and predefined line segments, making them unsuitable for modeling early brain responses. We used variations of the VGG-11 convolutional neural network (CNN) to create models of visual word recognition that starts from the pixel-level and performs the macro-scale computations needed for the detection and segmentation of letter shapes to word-form identification of large vocabulary of 10k Finnish words, regardless of letter size, shape, or rotation. The models were evaluated based on an existing magnetoencephalography (MEG) study where participants viewed regular words, pseudowords, noise-embedded words, symbol strings, and consonant strings. The original images used in the study were presented to the models and the activity in the layers was compared to MEG evoked response amplitudes. Through a few alterations to make the network more biologically plausible, we found an CNN architecture that can correctly simulate the behavior of three prominent responses, namely the type I (early visual response), type II (the 'letter string' response), and the N400m. In conclusion, starting a model of reading with convolution-and-pooling steps enables the flexibility and realism crucial for a direct model-to-brain comparison.
传统的阅读模型缺乏对早期视觉处理阶段的真实模拟,它们以字母库和预定义线段的形式获取输入,因此不适用于对早期大脑反应进行建模。我们使用VGG - 11卷积神经网络(CNN)的变体来创建视觉单词识别模型,该模型从像素级别开始,执行从字母形状检测和分割到10000个芬兰语单词的大词汇量单词形式识别所需的宏观尺度计算,而不考虑字母的大小、形状或旋转。这些模型是基于一项现有的脑磁图(MEG)研究进行评估的,在该研究中,参与者观看了常规单词、伪单词、嵌入噪声的单词、符号串和辅音串。将研究中使用的原始图像呈现给模型,并将各层的活动与MEG诱发反应幅度进行比较。通过一些调整使网络在生物学上更合理,我们发现了一种CNN架构,它可以正确模拟三种突出反应的行为,即I型(早期视觉反应)、II型(“字母串”反应)和N400m。总之,从卷积和池化步骤开始构建阅读模型能够实现对大脑进行直接模型比较至关重要的灵活性和真实性。