• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EMTeC:机器生成文本上的眼动语料库。

EMTeC: A corpus of eye movements on machine-generated texts.

作者信息

Bolliger Lena S, Haller Patrick, Cretton Isabelle C R, Reich David R, Kew Tannon, Jäger Lena A

机构信息

Department of Computational Linguistics, University of Zurich, Andreasstrasse 15, Zurich, 8050, Switzerland.

Department of Computer Science, University of Potsdam, An der Bahn 2, Potsdam, 14476, Germany.

出版信息

Behav Res Methods. 2025 Jun 3;57(7):189. doi: 10.3758/s13428-025-02677-4.

DOI:10.3758/s13428-025-02677-4
PMID:40461827
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12134054/
Abstract

The Eye movements on Machine-generated Texts Corpus (EMTeC) is a naturalistic eye-movements-while-reading corpus of 107 native English speakers reading machine-generated texts. The texts are generated by three large language models using five different decoding strategies, and they fall into six different text-type categories. EMTeC entails the eye movement data at all stages of pre-processing, i.e., the raw coordinate data sampled at 2000 Hz, the fixation sequences, and the reading measures. It further provides both the original and a corrected version of the fixation sequences, accounting for vertical calibration drift. Moreover, the corpus includes the language models' internals that underlie the generation of the stimulus texts: the transition scores, the attention scores, and the hidden states. The stimuli are annotated for a range of linguistic features both at text and at word level. We anticipate EMTeC to be utilized for a variety of use cases such as, but not restricted to, the investigation of reading behavior on machine-generated text and the impact of different decoding strategies; reading behavior on different text types; the development of new pre-processing, data filtering, and drift correction algorithms; the cognitive interpretability and enhancement of language models; and the assessment of the predictive power of surprisal and entropy for human reading times. The data at all stages of pre-processing, the model internals, and the code to reproduce the stimulus generation, data pre-processing, and analyses can be accessed via https://github.com/DiLi-Lab/EMTeC/ .

摘要

机器生成文本语料库上的眼动数据(EMTeC)是一个关于阅读机器生成文本时的自然主义眼动语料库,由107名以英语为母语的人阅读机器生成文本组成。这些文本由三个大语言模型使用五种不同的解码策略生成,分为六种不同的文本类型类别。EMTeC包含预处理各阶段的眼动数据,即2000赫兹采样的原始坐标数据、注视序列和阅读测量数据。它还提供了注视序列的原始版本和校正版本,以考虑垂直校准漂移。此外,该语料库包括刺激文本生成背后的语言模型内部数据:转移分数、注意力分数和隐藏状态。刺激文本在文本和单词层面都标注了一系列语言特征。我们预计EMTeC可用于多种用例,例如但不限于,研究在机器生成文本上的阅读行为以及不同解码策略的影响;不同文本类型上的阅读行为;开发新的预处理、数据过滤和漂移校正算法;语言模型的认知可解释性和增强;以及评估惊奇度和熵对人类阅读时间的预测能力。预处理各阶段的数据、模型内部数据以及用于重现刺激生成、数据预处理和分析的代码可通过https://github.com/DiLi-Lab/EMTeC/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/f01bc7967d6f/13428_2025_2677_Fig20_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/9f0e191de4ca/13428_2025_2677_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/4c478ba0b36b/13428_2025_2677_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/2a92d2b1dca2/13428_2025_2677_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/14885899c220/13428_2025_2677_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/3be686a8f725/13428_2025_2677_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/7a6d32e602b8/13428_2025_2677_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/8eec2e531d2d/13428_2025_2677_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/0a459f4bfbab/13428_2025_2677_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/988be3ab54df/13428_2025_2677_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/77757f2bb147/13428_2025_2677_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/7dc94bcc63c5/13428_2025_2677_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/4e07f62a76b8/13428_2025_2677_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/1154e29583ea/13428_2025_2677_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/49d6b175183e/13428_2025_2677_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/69910b85e723/13428_2025_2677_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/dfa1aaef383a/13428_2025_2677_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/e4087334acd0/13428_2025_2677_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/9bd12d0eab47/13428_2025_2677_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/312cb5c0ac9c/13428_2025_2677_Fig19_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/f01bc7967d6f/13428_2025_2677_Fig20_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/9f0e191de4ca/13428_2025_2677_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/4c478ba0b36b/13428_2025_2677_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/2a92d2b1dca2/13428_2025_2677_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/14885899c220/13428_2025_2677_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/3be686a8f725/13428_2025_2677_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/7a6d32e602b8/13428_2025_2677_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/8eec2e531d2d/13428_2025_2677_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/0a459f4bfbab/13428_2025_2677_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/988be3ab54df/13428_2025_2677_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/77757f2bb147/13428_2025_2677_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/7dc94bcc63c5/13428_2025_2677_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/4e07f62a76b8/13428_2025_2677_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/1154e29583ea/13428_2025_2677_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/49d6b175183e/13428_2025_2677_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/69910b85e723/13428_2025_2677_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/dfa1aaef383a/13428_2025_2677_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/e4087334acd0/13428_2025_2677_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/9bd12d0eab47/13428_2025_2677_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/312cb5c0ac9c/13428_2025_2677_Fig19_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f54/12134054/f01bc7967d6f/13428_2025_2677_Fig20_HTML.jpg

相似文献

1
EMTeC: A corpus of eye movements on machine-generated texts.EMTeC:机器生成文本上的眼动语料库。
Behav Res Methods. 2025 Jun 3;57(7):189. doi: 10.3758/s13428-025-02677-4.
2
PoTeC: A German naturalistic eye-tracking-while-reading corpus.PoTeC:一个德国阅读时自然主义眼动追踪语料库。
Behav Res Methods. 2025 Jun 30;57(8):211. doi: 10.3758/s13428-024-02536-8.
3
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Measures implemented in the school setting to contain the COVID-19 pandemic.学校为控制 COVID-19 疫情而采取的措施。
Cochrane Database Syst Rev. 2022 Jan 17;1(1):CD015029. doi: 10.1002/14651858.CD015029.
6
Naturalistic Eye Movement Tasks in Parkinson's Disease: A Systematic Review.帕金森病的自然主义眼动任务:系统评价。
J Parkinsons Dis. 2024;14(7):1369-1386. doi: 10.3233/JPD-240092.
7
Phonics training for English-speaking poor readers.针对说英语的阅读能力差的人进行的自然拼读法训练。
Cochrane Database Syst Rev. 2012 Dec 12;12:CD009115. doi: 10.1002/14651858.CD009115.pub2.
8
Psychological interventions for adults who have sexually offended or are at risk of offending.针对有性犯罪行为或有性犯罪风险的成年人的心理干预措施。
Cochrane Database Syst Rev. 2012 Dec 12;12(12):CD007507. doi: 10.1002/14651858.CD007507.pub2.
9
PDF Entity Annotation Tool (PEAT).PDF实体注释工具(PEAT)。
J Open Source Softw. 2025 Apr 8;10(108):5336. doi: 10.21105/joss.05336.
10
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.

引用本文的文献

1
Eye Tracking during Passage Reading Supports Precise Oculomotor Assessment in Ataxias.篇章阅读过程中的眼动追踪有助于对视神经共济失调进行精确的动眼神经评估。
medRxiv. 2025 Jan 17:2025.01.13.25320487. doi: 10.1101/2025.01.13.25320487.

本文引用的文献

1
PoTeC: A German naturalistic eye-tracking-while-reading corpus.PoTeC:一个德国阅读时自然主义眼动追踪语料库。
Behav Res Methods. 2025 Jun 30;57(8):211. doi: 10.3758/s13428-024-02536-8.
2
Dual Input Stream Transformer for Vertical Drift Correction in Eye-Tracking Reading Data.用于眼动追踪阅读数据垂直漂移校正的双输入流变压器
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8715-8726. doi: 10.1109/TPAMI.2024.3411938. Epub 2024 Nov 6.
3
Hong Kong Corpus of Chinese Sentence and Passage Reading.香港中文句子与篇章阅读语料库。
Sci Data. 2023 Dec 14;10(1):899. doi: 10.1038/s41597-023-02813-9.
4
The Plausibility of Sampling as an Algorithmic Theory of Sentence Processing.抽样作为句子处理算法理论的合理性。
Open Mind (Camb). 2023 Jul 21;7:350-391. doi: 10.1162/opmi_a_00086. eCollection 2023.
5
TURead: An eye movement dataset of Turkish reading.图读:土耳其语阅读的眼动数据集。
Behav Res Methods. 2024 Mar;56(3):1793-1816. doi: 10.3758/s13428-023-02120-6. Epub 2023 Jul 5.
6
The ZuCo benchmark on cross-subject reading task classification with EEG and eye-tracking data.用于脑电和眼动追踪数据的跨主题阅读任务分类的ZuCo基准测试。
Front Psychol. 2023 Jan 12;13:1028824. doi: 10.3389/fpsyg.2022.1028824. eCollection 2022.
7
CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading.CELER:一个包含365名参与者的第一语言和第二语言英语阅读眼动语料库。
Open Mind (Camb). 2022 Jul 1;6:41-50. doi: 10.1162/opmi_a_00054. eCollection 2022.
8
RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese.RastrOS项目:自然语言处理对巴西葡萄牙语眼动追踪语料库发展的贡献及可预测性规范
Lang Resour Eval. 2022;56(4):1333-1372. doi: 10.1007/s10579-022-09609-0. Epub 2022 Aug 17.
9
GECO-CN: Ghent Eye-tracking COrpus of sentence reading for Chinese-English bilinguals.GECO-CN:用于汉英双语者的根特眼动句子阅读语料库。
Behav Res Methods. 2023 Sep;55(6):2743-2763. doi: 10.3758/s13428-022-01931-3. Epub 2022 Jul 27.
10
The database of eye-movement measures on words in Chinese reading.中文阅读中眼动测量的数据库。
Sci Data. 2022 Jul 15;9(1):411. doi: 10.1038/s41597-022-01464-6.