Fan Xi, Reilly Ronan
J Eye Mov Res. 2020 Sep 9;13(6). doi: 10.16910/jemr.13.6.2.
This paper describes the use of semantic similarity measures based on distributed representations of words, sentences, and paragraphs (so-called "embeddings") to assess the impact of supra-lexical factors on eye-movement data from early readers of Chinese. In addition, we used a corpus-based measure of surprisal to assess the impact of local word predictability. Eye movement data from 56 Chinese students were collected (a) in the students' 4th grade and (b) one year later while they were in 5th grade. Results indicated that surprisal and some text similarity measures have a significant impact on the momentto- moment processing of words in reading. The paper presents an easy-to-use set of tools for linking the low-level aspects of fixation durations to a hierarchy of sentence-level and paragraph-level features that can be computed automatically. The study is the first attempt, as far as we are aware, to track the developmental trajectory of these influences in developing readers across a range of reading abilities. The similarity-based measures described here can be used (a) to provide a measure of reader sensitivity to sentence and paragraph cohesion and (b) to assess specific texts for their suitability for readers of different reading ability levels.
本文描述了基于单词、句子和段落的分布式表示(即所谓的“嵌入”)的语义相似性度量方法的使用,以评估超词汇因素对中国早期阅读者眼动数据的影响。此外,我们使用了基于语料库的意外性度量方法来评估局部单词可预测性的影响。收集了56名中国学生的眼动数据:(a)在这些学生四年级时,以及(b)一年后他们五年级时。结果表明,意外性和一些文本相似性度量方法对阅读中单词的即时处理有显著影响。本文介绍了一套易于使用的工具,用于将注视持续时间的低级方面与句子级和段落级特征层次结构相联系,这些特征可以自动计算。据我们所知,该研究首次尝试在具有不同阅读能力的发展中阅读者中追踪这些影响的发展轨迹。这里描述的基于相似性的度量方法可用于:(a)提供一种衡量读者对句子和段落衔接敏感性的方法,以及(b)评估特定文本对不同阅读能力水平读者的适用性。