Jakobi Deborah N, Kern Thomas, Reich David R, Haller Patrick, Jäger Lena A
Department of Computational Linguistics, University of Zurich, Andreasstrasse 15, Zurich, 8050, Switzerland.
Department of Computer Science, University of Potsdam, An der Bahn 2, Potsdam, 14476, Germany.
Behav Res Methods. 2025 Jun 30;57(8):211. doi: 10.3758/s13428-024-02536-8.
The Potsdam Textbook Corpus (PoTeC) is a naturalistic eye-tracking-while-reading corpus containing data from 75 participants reading 12 scientific texts. PoTeC is the first naturalistic eye-tracking-while-reading corpus that contains eye-movements from domain experts as well as novices in a within-participant manipulation: It is based on a 2 2 2 fully crossed factorial design, which includes the participants' level of studies and the participants' discipline of studies as between-subjects factors and the text domain as a within-subjects factor. The participants' reading comprehension was assessed by a series of text comprehension questions and their domain knowledge was tested by text-independent background questions for each of the texts. The materials are annotated for a variety of linguistic features at different levels. We envision PoTeC to be used for a wide range of studies including but not limited to analyses of expert and non-expert reading strategies. The corpus and all the accompanying data at all stages of the preprocessing pipeline and all code used to preprocess the data is made available via GitHub: https://github.com/DiLi-Lab/PoTeC and OSF: https://osf.io/dn5hp/ . The data is furthermore integrated into the open-source package pymovements, which can be used in Python and R: https://github.com/aeye-lab/pymovements .
波茨坦教科书语料库(PoTeC)是一个阅读时自然主义眼动追踪语料库,包含75名参与者阅读12篇科学文本的数据。PoTeC是第一个阅读时自然主义眼动追踪语料库,在参与者内部操作中包含领域专家和新手的眼动数据:它基于2×2×2完全交叉析因设计,其中参与者的学习水平和参与者的学科作为组间因素,文本领域作为组内因素。通过一系列文本理解问题评估参与者的阅读理解能力,并通过与文本无关的背景问题测试他们对每篇文本的领域知识。这些材料针对不同层次的各种语言特征进行了注释。我们设想PoTeC可用于广泛的研究,包括但不限于专家和非专家阅读策略分析。语料库以及预处理管道所有阶段的所有伴随数据和用于预处理数据的所有代码可通过GitHub获取:https://github.com/DiLi-Lab/PoTeC 以及OSF:https://osf.io/dn5hp/ 。此外,数据还集成到开源包pymovements中,可在Python和R中使用:https://github.com/aeye-lab/pymovements 。