Suppr超能文献

自然故事语料库:一个包含罕见句法结构的英语文本阅读时间语料库。

The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions.

作者信息

Futrell Richard, Gibson Edward, Tily Harry J, Blank Idan, Vishnevetsky Anastasia, Piantadosi Steven T, Fedorenko Evelina

机构信息

University of California, Irvine, USA.

Massachusetts Institute of Technology, Cambridge , USA.

出版信息

Lang Resour Eval. 2021;55(1):63-77. doi: 10.1007/s10579-020-09503-7. Epub 2020 Sep 4.

Abstract

It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on naturally-occurring text, do not contain many of the low-frequency syntactic constructions that are often required to distinguish between processing theories. Here we describe a new corpus consisting of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected Penn Treebank-style parse trees and includes self-paced reading time data and aligned audio recordings. We give an overview of the content of the corpus, review recent work using the corpus, and release the data.

摘要

目前,通过比较人类语言处理模型对处理难度的行为和神经测量指标(如阅读时间)的预测能力,来比较这些模型已成为一种常见做法,这些指标是基于丰富的自然语言材料语料库得出的。然而,许多基于自然出现文本的语料库并不包含许多区分处理理论所需的低频句法结构。在这里,我们描述了一个新的语料库,它由编辑后的英语文本组成,包含许多低频句法结构,同时对以英语为母语的人来说听起来仍然很流畅。该语料库带有手工校正的宾夕法尼亚树库风格的句法剖析树注释,包括自定步速阅读时间数据和对齐的音频记录。我们概述了该语料库的内容,回顾了使用该语料库的近期研究工作,并发布了这些数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238e/8549930/afb7c0587f14/10579_2020_9503_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验