Hakala Tero, Lindh-Knuutila Tiina, Hultén Annika, Lehtonen Minna, Salmelin Riitta
Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.
Aalto NeuroImaging, Aalto University, Espoo, Finland.
Neurobiol Lang (Camb). 2024 Sep 11;5(4):844-863. doi: 10.1162/nol_a_00149. eCollection 2024.
This study extends the idea of decoding word-evoked brain activations using a corpus-semantic vector space to multimorphemic words in the agglutinative Finnish language. The corpus-semantic models are trained on word segments, and decoding is carried out with word vectors that are composed of these segments. We tested several alternative vector-space models using different segmentations: no segmentation (whole word), linguistic morphemes, statistical morphemes, random segmentation, and character-level 1-, 2- and 3-grams, and paired them with recorded MEG responses to multimorphemic words in a visual word recognition task. For all variants, the decoding accuracy exceeded the standard word-label permutation-based significance thresholds at 350-500 ms after stimulus onset. However, the critical segment-label permutation test revealed that only those segmentations that were morphologically aware reached significance in the brain decoding task. The results suggest that both whole-word forms and morphemes are represented in the brain and show that neural decoding using corpus-semantic word representations derived from compositional subword segments is applicable also for multimorphemic word forms. This is especially relevant for languages with complex morphology, because a large proportion of word forms are rare and it can be difficult to find statistically reliable surface representations for them in any large corpus.
本研究将利用语料库语义向量空间解码单词诱发的大脑激活这一理念扩展至黏着语芬兰语中的多语素单词。语料库语义模型基于单词片段进行训练,解码则使用由这些片段组成的单词向量来进行。我们使用不同的切分方式测试了几种替代向量空间模型:不切分(整个单词)、语言语素、统计语素、随机切分以及字符级别的单字、双字和三字组,并将它们与视觉单词识别任务中多语素单词的记录脑磁图(MEG)反应进行配对。对于所有变体,在刺激开始后350 - 500毫秒时,解码准确率超过了基于标准单词标签置换的显著性阈值。然而,关键的片段标签置换测试表明,在大脑解码任务中,只有那些具有形态意识的切分方式达到了显著性。结果表明,全词形式和语素在大脑中均有表征,并表明使用从组合子词片段派生的语料库语义单词表征进行神经解码也适用于多语素单词形式。这对于形态复杂的语言尤为重要,因为很大一部分单词形式很少见,在任何大型语料库中都很难找到统计上可靠的表面表征。