文献检索，用中文搜 PubMed

The Eye movements on Machine-generated Texts Corpus (EMTeC) is a naturalistic eye-movements-while-reading corpus of 107 native English speakers reading machine-generated texts. The texts are generated by three large language models using five different decoding strategies, and they fall into six different text-type categories. EMTeC entails the eye movement data at all stages of pre-processing, i.e., the raw coordinate data sampled at 2000 Hz, the fixation sequences, and the reading measures. It further provides both the original and a corrected version of the fixation sequences, accounting for vertical calibration drift. Moreover, the corpus includes the language models' internals that underlie the generation of the stimulus texts: the transition scores, the attention scores, and the hidden states. The stimuli are annotated for a range of linguistic features both at text and at word level. We anticipate EMTeC to be utilized for a variety of use cases such as, but not restricted to, the investigation of reading behavior on machine-generated text and the impact of different decoding strategies; reading behavior on different text types; the development of new pre-processing, data filtering, and drift correction algorithms; the cognitive interpretability and enhancement of language models; and the assessment of the predictive power of surprisal and entropy for human reading times. The data at all stages of pre-processing, the model internals, and the code to reproduce the stimulus generation, data pre-processing, and analyses can be accessed via https://github.com/DiLi-Lab/EMTeC/ .

机器生成文本语料库上的眼动数据（EMTeC）是一个关于阅读机器生成文本时的自然主义眼动语料库，由107名以英语为母语的人阅读机器生成文本组成。这些文本由三个大语言模型使用五种不同的解码策略生成，分为六种不同的文本类型类别。EMTeC包含预处理各阶段的眼动数据，即2000赫兹采样的原始坐标数据、注视序列和阅读测量数据。它还提供了注视序列的原始版本和校正版本，以考虑垂直校准漂移。此外，该语料库包括刺激文本生成背后的语言模型内部数据：转移分数、注意力分数和隐藏状态。刺激文本在文本和单词层面都标注了一系列语言特征。我们预计EMTeC可用于多种用例，例如但不限于，研究在机器生成文本上的阅读行为以及不同解码策略的影响；不同文本类型上的阅读行为；开发新的预处理、数据过滤和漂移校正算法；语言模型的认知可解释性和增强；以及评估惊奇度和熵对人类阅读时间的预测能力。预处理各阶段的数据、模型内部数据以及用于重现刺激生成、数据预处理和分析的代码可通过https://github.com/DiLi-Lab/EMTeC/获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

EMTeC：机器生成文本上的眼动语料库。

EMTeC: A corpus of eye movements on machine-generated texts.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献