• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

书面语言的大规模方差分析

A Large-Scale Analysis of Variance in Written Language.

作者信息

Johns Brendan T, Jamieson Randall K

机构信息

Department of Communicative Disorders and Sciences, University at Buffalo.

Department of Psychology, University of Manitoba.

出版信息

Cogn Sci. 2018 May;42(4):1360-1374. doi: 10.1111/cogs.12583. Epub 2018 Jan 22.

DOI:10.1111/cogs.12583
PMID:29356046
Abstract

The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers, & Tenenbaum, ; Jones & Mewhort, ; Landauer & Dumais, ; Mikolov, Sutskever, Chen, Corrado, & Dean, ). The models treat knowledge as an interaction of processing mechanisms and the structure of language experience. But language experience is often treated agnostically. We report a distributional semantic analysis that shows written language in fiction books varies appreciably between books from the different genres, books from the same genre, and even books written by the same author. Given that current theories assume that word knowledge reflects an interaction between processing mechanisms and the language environment, the analysis shows the need for the field to engage in a more deliberate consideration and curation of the corpora used in computational studies of natural language processing.

摘要

大量文本源的收集彻底改变了自然语言研究,催生了多种语言学习模型和分布语义学,这些模型基于自然语言中包含的统计冗余提取复杂的单词语义表示(例如,格里菲思、斯泰弗斯和特南鲍姆;琼斯和梅霍特;兰道尔和杜迈斯;米科洛夫、苏茨克维、陈、科拉多和迪恩)。这些模型将知识视为处理机制与语言经验结构的相互作用。但语言经验往往未得到深入探讨。我们报告了一项分布语义分析,结果表明小说书籍中的书面语言在不同体裁、同一体裁的不同书籍甚至同一作者所写的书籍之间存在明显差异。鉴于当前理论认为单词知识反映了处理机制与语言环境之间的相互作用,该分析表明该领域需要更审慎地考虑和筛选自然语言处理计算研究中使用的语料库。

相似文献

1
A Large-Scale Analysis of Variance in Written Language.书面语言的大规模方差分析
Cogn Sci. 2018 May;42(4):1360-1374. doi: 10.1111/cogs.12583. Epub 2018 Jan 22.
2
The Role of Negative Information in Distributional Semantic Learning.负向信息在分布语义学习中的作用。
Cogn Sci. 2019 May;43(5):e12730. doi: 10.1111/cogs.12730.
3
Estimating the average need of semantic knowledge from distributional semantic models.从分布语义模型估计语义知识的平均需求。
Mem Cognit. 2017 Nov;45(8):1350-1370. doi: 10.3758/s13421-017-0732-1.
4
The principals of meaning: Extracting semantic dimensions from co-occurrence models of semantics.意义的原则:从语义共现模型中提取语义维度。
Psychon Bull Rev. 2016 Dec;23(6):1744-1756. doi: 10.3758/s13423-016-1053-2.
5
The hidden Markov Topic model: a probabilistic model of semantic representation.隐马尔可夫主题模型:一种语义表征的概率模型。
Top Cogn Sci. 2010 Jan;2(1):101-13. doi: 10.1111/j.1756-8765.2009.01074.x.
6
Mining a Crowdsourced Dictionary to Understand Consistency and Preference in Word Meanings.挖掘众包词典以理解词义的一致性和偏好
Front Psychol. 2019 Feb 18;10:268. doi: 10.3389/fpsyg.2019.00268. eCollection 2019.
7
The influence of place and time on lexical behavior: A distributional analysis.地点和时间对词汇行为的影响:一种分布分析。
Behav Res Methods. 2019 Dec;51(6):2438-2453. doi: 10.3758/s13428-019-01289-z.
8
Distributional social semantics: Inferring word meanings from communication patterns.分布社会语义学:从交流模式中推断词义。
Cogn Psychol. 2021 Dec;131:101441. doi: 10.1016/j.cogpsych.2021.101441. Epub 2021 Oct 16.
9
Determining the optimal environmental information for training computational models of lexical semantics and lexical organization.确定用于训练词汇语义和词汇组织计算模型的最佳环境信息。
Can J Exp Psychol. 2024 Sep;78(3):163-173. doi: 10.1037/cep0000344.
10
Multimodal Word Meaning Induction From Minimal Exposure to Natural Text.从对自然文本的最少接触中进行多模态词义归纳。
Cogn Sci. 2017 Apr;41 Suppl 4:677-705. doi: 10.1111/cogs.12481. Epub 2017 Mar 21.

引用本文的文献

1
A comparison of word humor ratings across speakers of North American, British, and Singapore English.北美英语、英国英语和新加坡英语使用者的词汇幽默评级比较。
Mem Cognit. 2025 Feb;53(2):568-589. doi: 10.3758/s13421-024-01587-8. Epub 2024 Jun 12.
2
Assessment of Hypertensive Patients' Complex Metabolic Status Using Data Mining Methods.运用数据挖掘方法评估高血压患者的复杂代谢状况
J Cardiovasc Dev Dis. 2023 Aug 13;10(8):345. doi: 10.3390/jcdd10080345.
3
Determining the prevalence of childhood hypertension and its concomitant metabolic abnormalities using data mining methods in the Northeastern region of Hungary.
利用数据挖掘方法确定匈牙利东北部地区儿童高血压及其伴随的代谢异常的患病率。
Front Cardiovasc Med. 2023 Jan 10;9:1081986. doi: 10.3389/fcvm.2022.1081986. eCollection 2022.
4
Contextual dynamics in lexical encoding across the ageing spectrum: A simulation study.语境动态在老化谱中的词汇编码中:一项模拟研究。
Q J Exp Psychol (Hove). 2023 Sep;76(9):2164-2182. doi: 10.1177/17470218221145685. Epub 2022 Dec 27.
5
Exploring the Relationship Between Fiction Reading and Emotion Recognition.探究小说阅读与情感识别之间的关系。
Affect Sci. 2021 Apr 20;2(2):178-186. doi: 10.1007/s42761-021-00034-0. eCollection 2021 Jun.
6
Accounting for item-level variance in recognition memory: Comparing word frequency and contextual diversity.在识别记忆中考虑项目水平的差异:比较词频和语境多样性。
Mem Cognit. 2022 Jul;50(5):1013-1032. doi: 10.3758/s13421-021-01249-z. Epub 2021 Nov 22.
7
Mining a Crowdsourced Dictionary to Understand Consistency and Preference in Word Meanings.挖掘众包词典以理解词义的一致性和偏好
Front Psychol. 2019 Feb 18;10:268. doi: 10.3389/fpsyg.2019.00268. eCollection 2019.
8
A Large-Scale Semantic Analysis of Verbal Fluency Across the Aging Spectrum: Data From the Canadian Longitudinal Study on Aging.在老龄化光谱中进行言语流畅性的大规模语义分析:来自加拿大老龄化纵向研究的数据。
J Gerontol B Psychol Sci Soc Sci. 2020 Oct 16;75(9):e221-e230. doi: 10.1093/geronb/gbz003.
9
Using experiential optimization to build lexical representations.利用经验优化构建词汇表示。
Psychon Bull Rev. 2019 Feb;26(1):103-126. doi: 10.3758/s13423-018-1501-2.