Facultad de Psicología, Universidad Nacional Autónoma de México, Mexico City, Mexico.
Behav Res Methods. 2024 Mar;56(3):2486-2498. doi: 10.3758/s13428-023-02160-y. Epub 2023 Jul 5.
Sentence-final completion tasks serve as valuable tools in studying language processing and the associated predictive mechanisms. There are several established sentence-completion norms for languages like English, Portuguese, French, and Spanish, each tailored to the language it was designed for and evaluated in. Yet, cultural variations among native speakers of the same language complicate the claim of a universal application of these norms. In this study, we developed a corpus of 2925 sentence-completion norms specifically for Mexican Spanish. This corpus is distinctive for several reasons: Firstly, it is the most comprehensive set of sentence-completion norms for Mexican Spanish to date. Secondly, it offers a substantial range of experimental stimuli with considerable variability in terms of the predictability of word sentence completion (cloze probability/surprisal) and the level of uncertainty inherent in the sentence context (entropy). Thirdly, the syntactic complexity of the sentences in the corpus is varied, as are the characteristics of the final word nouns (including aspects of concreteness/abstractness, length, and frequency). This paper details the generation of the sentence contexts, explains the methodology employed for data collection from a total of 1470 participants, and outlines the approach to data analysis for the establishment of sentence-completion norms. These norms provide a significant contribution to fields such as linguistics, cognitive science, and machine learning, among others, by enhancing our understanding of language, predictive mechanisms, knowledge representation, and context representation. The collected data is accessible through the Open Science Framework (OSF) at the following link: https://osf.io/js359/?view_only=bb1b328d37d643df903ed69bb2405ac0 .
句尾完成任务是研究语言处理和相关预测机制的宝贵工具。有几种针对英语、葡萄牙语、法语和西班牙语等语言的既定句完成规范,每种规范都是针对其设计和评估的语言量身定制的。然而,同一语言的母语者之间的文化差异使得这些规范普遍适用的说法变得复杂。在这项研究中,我们为墨西哥西班牙语开发了一个包含 2925 个句完成规范的语料库。这个语料库有几个特点:首先,它是迄今为止针对墨西哥西班牙语的最全面的句完成规范集。其次,它提供了大量的实验刺激,在词句完成的可预测性(封闭概率/惊讶度)和句上下文固有的不确定性(熵)方面具有很大的可变性。第三,语料库中的句的句法复杂性不同,句末名词的特征(包括具体/抽象、长度和频率等方面)也不同。本文详细介绍了句上下文的生成过程,解释了从总共 1470 名参与者那里收集数据所采用的方法,并概述了建立句完成规范的数据分析方法。这些规范通过增强我们对语言、预测机制、知识表示和上下文表示的理解,为语言学、认知科学、机器学习等领域做出了重要贡献。收集到的数据可通过以下链接在开放科学框架(OSF)上获取:https://osf.io/js359/?view_only=bb1b328d37d643df903ed69bb2405ac0 。