Kim Najoung, Kim Jung-Ho, Wolters Maria K, MacPherson Sarah E, Park Jong C
School of Computing, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.
Front Psychol. 2019 May 16;10:1020. doi: 10.3389/fpsyg.2019.01020. eCollection 2019.
In neuropsychological assessment, semantic fluency is a widely accepted measure of executive function and access to semantic memory. While fluency scores are typically reported as the number of unique words produced, several alternative manual scoring methods have been proposed that provide additional insights into performance, such as clusters of semantically related items. Many automatic scoring methods yield metrics that are difficult to relate to the theories behind manual scoring methods, and most require manually-curated linguistic ontologies or large corpus infrastructure. In this paper, we propose a novel automatic scoring method based on Wikipedia, Backlink-VSM, which is easily adaptable to any of the 61 languages with more than 100k Wikipedia entries, can account for cultural differences in semantic relatedness, and covers a wide range of item categories. Our Backlink-VSM method combines relational knowledge as represented by links between Wikipedia entries () with a semantic proximity metric derived from distributional representations (; VSM). Backlink-VSM yields measures that approximate manual clustering and switching analyses, providing a straightforward link to the substantial literature that uses these metrics. We illustrate our approach with examples from two languages (English and Korean), and two commonly used categories of items (animals and fruits). For both Korean and English, we show that the measures generated by our automatic scoring procedure correlate well with manual annotations. We also successfully replicate findings that older adults produce significantly fewer switches compared to younger adults. Furthermore, our automatic scoring procedure outperforms the manual scoring method and a WordNet-based model in separating younger and older participants measured by binary classification accuracy for both English and Korean datasets. Our method also generalizes to a different category (fruit), demonstrating its adaptability.
在神经心理学评估中,语义流畅性是一种广泛认可的执行功能和语义记忆获取能力的测量方法。虽然流畅性分数通常以产出的独特单词数量来报告,但已经提出了几种替代的人工评分方法,这些方法能提供对表现的更多见解,比如语义相关项目的聚类。许多自动评分方法产生的指标难以与人工评分方法背后的理论相关联,并且大多数需要人工策划的语言本体或大型语料库基础设施。在本文中,我们提出了一种基于维基百科的新颖自动评分方法,即反向链接向量空间模型(Backlink-VSM),它易于适应拥有超过10万个维基百科条目的61种语言中的任何一种,能够考虑语义相关性方面的文化差异,并且涵盖广泛的项目类别。我们的反向链接向量空间模型方法将维基百科条目之间的链接所表示的关系知识与从分布表示中派生的语义接近度度量(向量空间模型;VSM)相结合。反向链接向量空间模型产生的度量近似于人工聚类和转换分析,为使用这些度量的大量文献提供了直接联系。我们用来自两种语言(英语和韩语)以及两种常用项目类别(动物和水果)的例子来说明我们的方法。对于韩语和英语,我们都表明我们的自动评分程序生成的度量与人工注释有很好的相关性。我们还成功复制了这样的发现:与年轻人相比,老年人产生的转换显著更少。此外,在按二元分类准确率衡量的英语和韩语数据集中,我们的自动评分程序在区分年轻和年长参与者方面优于人工评分方法和基于词网的模型。我们的方法还能推广到不同的类别(水果),证明了其适应性。