• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人类和计算机对书面语言中单词可预测性的估计。

Human and computer estimations of Predictability of words in written language.

机构信息

Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires - Consejo Nacional de Investigación en Ciencia y Técnica, Ciudad Autónoma de Buenos Aires, Argentina.

Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina.

出版信息

Sci Rep. 2020 Mar 10;10(1):4396. doi: 10.1038/s41598-020-61353-z.

DOI:10.1038/s41598-020-61353-z
PMID:32157161
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7064512/
Abstract

When we read printed text, we are continuously predicting upcoming words to integrate information and guide future eye movements. Thus, the Predictability of a given word has become one of the most important variables when explaining human behaviour and information processing during reading. In parallel, the Natural Language Processing (NLP) field evolved by developing a wide variety of applications. Here, we show that using different word embeddings techniques (like Latent Semantic Analysis, Word2Vec, and FastText) and N-gram-based language models we were able to estimate how humans predict words (cloze-task Predictability) and how to better understand eye movements in long Spanish texts. Both types of models partially captured aspects of predictability. On the one hand, our N-gram model performed well when added as a replacement for the cloze-task Predictability of the fixated word. On the other hand, word embeddings were useful to mimic Predictability of the following word. Our study joins efforts from neurolinguistic and NLP fields to understand human information processing during reading to potentially improve NLP algorithms.

摘要

当我们阅读印刷文本时,我们会不断预测接下来的单词,以整合信息并指导未来的眼球运动。因此,给定单词的可预测性已成为解释阅读过程中人类行为和信息处理的最重要变量之一。与此同时,自然语言处理 (NLP) 领域通过开发各种应用程序得到了发展。在这里,我们展示了使用不同的词嵌入技术(如潜在语义分析、Word2Vec 和 FastText)和基于 N 元组的语言模型,我们能够估计人类如何预测单词( cloze-task Predictability )以及如何更好地理解长西班牙文本中的眼球运动。这两种类型的模型都部分捕捉到了可预测性的各个方面。一方面,我们的 N 元组模型在作为注视词 cloze-task Predictability 的替代物添加时表现良好。另一方面,词嵌入对于模拟下一个词的可预测性很有用。我们的研究结合了神经语言学和 NLP 领域的努力,以了解阅读过程中的人类信息处理,从而有可能改进 NLP 算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d72/7064512/ec78d682100f/41598_2020_61353_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d72/7064512/1d672e6e09dc/41598_2020_61353_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d72/7064512/7ed990e5cfa9/41598_2020_61353_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d72/7064512/4dcab8b03fe3/41598_2020_61353_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d72/7064512/ec78d682100f/41598_2020_61353_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d72/7064512/1d672e6e09dc/41598_2020_61353_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d72/7064512/7ed990e5cfa9/41598_2020_61353_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d72/7064512/4dcab8b03fe3/41598_2020_61353_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d72/7064512/ec78d682100f/41598_2020_61353_Fig4_HTML.jpg

相似文献

1
Human and computer estimations of Predictability of words in written language.人类和计算机对书面语言中单词可预测性的估计。
Sci Rep. 2020 Mar 10;10(1):4396. doi: 10.1038/s41598-020-61353-z.
2
Language models outperform cloze predictability in a cognitive model of reading.语言模型在阅读认知模型中优于完形预测能力。
PLoS Comput Biol. 2024 Sep 25;20(9):e1012117. doi: 10.1371/journal.pcbi.1012117. eCollection 2024 Sep.
3
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
4
Language Models Explain Word Reading Times Better Than Empirical Predictability.语言模型比经验可预测性能更好地解释单词阅读时间。
Front Artif Intell. 2022 Feb 2;4:730570. doi: 10.3389/frai.2021.730570. eCollection 2021.
5
Morphosyntactic but not lexical corpus-based probabilities can substitute for cloze probabilities in reading experiments.基于形态句法而非词汇的语料库概率可以替代 cloze 概率在阅读实验中使用。
PLoS One. 2021 Jan 28;16(1):e0246133. doi: 10.1371/journal.pone.0246133. eCollection 2021.
6
Training and intrinsic evaluation of lightweight word embeddings for the clinical domain in Spanish.西班牙语临床领域轻量级词嵌入的训练与内在评估
Front Artif Intell. 2022 Sep 21;5:970517. doi: 10.3389/frai.2022.970517. eCollection 2022.
7
Lexical Predictability During Natural Reading: Effects of Surprisal and Entropy Reduction.自然阅读过程中的词汇可预测性:意外性和熵减少的影响。
Cogn Sci. 2018 Jun;42 Suppl 4(Suppl 4):1166-1183. doi: 10.1111/cogs.12597. Epub 2018 Feb 14.
8
RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese.RastrOS项目:自然语言处理对巴西葡萄牙语眼动追踪语料库发展的贡献及可预测性规范
Lang Resour Eval. 2022;56(4):1333-1372. doi: 10.1007/s10579-022-09609-0. Epub 2022 Aug 17.
9
Limits on lexical prediction during reading.阅读过程中词汇预测的局限性。
Cogn Psychol. 2016 Aug;88:22-60. doi: 10.1016/j.cogpsych.2016.06.002. Epub 2016 Jul 1.
10
Word frequency, predictability, and return-sweep saccades: Towards the modeling of eye movements during paragraph reading.词汇频率、可预测性和回扫扫视:朝向段落阅读中眼球运动的建模。
J Exp Psychol Hum Percept Perform. 2019 Dec;45(12):1614-1633. doi: 10.1037/xhp0000694. Epub 2019 Sep 16.

引用本文的文献

1
DERCo: A Dataset for Human Behaviour in Reading Comprehension Using EEG.DERCo:使用 EEG 进行阅读理解的人类行为数据集
Sci Data. 2024 Oct 9;11(1):1104. doi: 10.1038/s41597-024-03915-8.
2
Language Models Explain Word Reading Times Better Than Empirical Predictability.语言模型比经验可预测性能更好地解释单词阅读时间。
Front Artif Intell. 2022 Feb 2;4:730570. doi: 10.3389/frai.2021.730570. eCollection 2021.

本文引用的文献

1
Towards a neuroscience of active sampling and curiosity.迈向主动采样和好奇心的神经科学。
Nat Rev Neurosci. 2018 Dec;19(12):758-770. doi: 10.1038/s41583-018-0078-0.
2
Theoretical perspectives on active sensing.主动感知的理论视角。
Curr Opin Behav Sci. 2018 Oct;11:100-108. doi: 10.1016/j.cobeha.2016.06.009.
3
Attention in Active Vision: A Perspective on Perceptual Continuity Across Saccades.主动视觉中的注意力:关于扫视过程中感知连续性的一种观点。
Perception. 2015;44(8-9):900-19. doi: 10.1177/0301006615594965. Epub 2015 Aug 19.
4
Reading is fundamentally similar across disparate writing systems: a systematic characterization of how words and characters influence eye movements in Chinese reading.在不同的书写系统中,阅读本质上是相似的:一项关于汉字和字符如何影响中文阅读中眼动的系统表征。
J Exp Psychol Gen. 2014 Apr;143(2):895-913. doi: 10.1037/a0033580. Epub 2013 Jul 8.
5
The effect of word predictability on reading time is logarithmic.词的可预测性对阅读时间的影响是对数的。
Cognition. 2013 Sep;128(3):302-19. doi: 10.1016/j.cognition.2013.02.013. Epub 2013 Jun 6.
6
Frequency and predictability effects in the Dundee Corpus: an eye movement analysis.邓迪语料库中的频率和可预测性效应:一项眼动分析。
Q J Exp Psychol (Hove). 2013;66(3):601-18. doi: 10.1080/17470218.2012.676054. Epub 2012 May 29.
7
Tracking the mind during reading via eye movements: comments on Kliegl, Nuthmann, and Engbert (2006).通过眼动追踪阅读过程中的思维:对克莱格尔、努特曼和恩格伯特(2006年)的评论
J Exp Psychol Gen. 2007 Aug;136(3):520-9; discussion 530-7. doi: 10.1037/0096-3445.136.3.520.
8
Tracking the mind during reading: the influence of past, present, and future words on fixation durations.阅读过程中对思维的追踪:过去、现在和未来词汇对注视持续时间的影响。
J Exp Psychol Gen. 2006 Feb;135(1):12-35. doi: 10.1037/0096-3445.135.1.12.
9
Eye movements in reading and information processing: 20 years of research.阅读与信息处理中的眼动:二十年研究
Psychol Bull. 1998 Nov;124(3):372-422. doi: 10.1037/0033-2909.124.3.372.
10
A theory of reading: from eye fixations to comprehension.一种阅读理论:从眼动注视到阅读理解。
Psychol Rev. 1980 Jul;87(4):329-54.