Department of Otolaryngology, Washington University in St. Louis, 660 South Euclid, Box 8115, Saint Louis, MO, 63110, USA.
Department of Neurology, Washington University in St. Louis, St. Louis, MO, USA.
Behav Res Methods. 2020 Aug;52(4):1795-1799. doi: 10.3758/s13428-020-01351-1.
In everyday language processing, sentence context affects how readers and listeners process upcoming words. In experimental situations, it can be useful to identify words that are predicted to greater or lesser degrees by the preceding context. Here we report completion norms for 3085 English sentences, collected online using a written cloze procedure in which participants were asked to provide their best guess for the word completing a sentence. Sentences varied between eight and ten words in length. At least 100 unique participants contributed to each sentence. All responses were reviewed by human raters to mitigate the influence of mis-spellings and typographical errors. The responses provide a range of predictability values for 13,438 unique target words, 6790 of which appear in more than one sentence context. We also provide entropy values based on the relative predictability of multiple responses. A searchable set of norms is available at http://sentencenorms.net . Finally, we provide the code used to collate and organize the responses to facilitate additional analyses and future research projects.
在日常语言处理中,句子上下文会影响读者和听者处理后续单词的方式。在实验环境中,确定哪些单词在前文的语境中被预测到的程度更大或更小可能会很有用。在这里,我们报告了 3085 个英语句子的补全规范,这些句子是通过在线书面完形填空程序收集的,参与者被要求为完成句子的单词提供最佳猜测。句子的长度在 8 到 10 个单词之间。每个句子至少有 100 个独特的参与者参与。所有的回答都由人类评分员进行了审查,以减轻拼写错误和打字错误的影响。这些回答为 13438 个独特的目标词提供了一系列的可预测性值,其中 6790 个词出现在多个句子语境中。我们还提供了基于多个回答的相对可预测性的熵值。可在 http://sentencenorms.net 上搜索到一套规范。最后,我们提供了用于整理和组织回答的代码,以方便进行额外的分析和未来的研究项目。