Rajkumar Rajakrishnan, van Schijndel Marten, White Michael, Schuler William
Department of Humanities and Social Sciences, IIT Delhi, Hauz Khas, New Delhi 110016, India.
Department of Linguistics, The Ohio State University, Oxley Hall, 1712 Neil Ave., Columbus, OH 43210, USA.
Cognition. 2016 Oct;155:204-232. doi: 10.1016/j.cognition.2016.06.008. Epub 2016 Jul 16.
We investigate the extent to which syntactic choice in written English is influenced by processing considerations as predicted by Gibson's (2000) Dependency Locality Theory (DLT) and Surprisal Theory (Hale, 2001; Levy, 2008). A long line of previous work attests that languages display a tendency for shorter dependencies, and in a previous corpus study, Temperley (2007) provided evidence that this tendency exerts a strong influence on constituent ordering choices. However, Temperley's study included no frequency-based controls, and subsequent work on sentence comprehension with broad-coverage eye-tracking corpora found weak or negative effects of DLT-based measures when frequency effects were statistically controlled for (Demberg & Keller, 2008; van Schijndel, Nguyen, & Schuler 2013; van Schijndel & Schuler, 2013), calling into question the actual impact of dependency locality on syntactic choice phenomena. Going beyond Temperley's work, we show that DLT integration costs are indeed a significant predictor of syntactic choice in written English even in the presence of competing frequency-based and cognitively motivated control factors, including n-gram probability and PCFG surprisal as well as embedding depth (Wu, Bachrach, Cardenas, & Schuler, 2010; Yngve, 1960). Our study also shows that the predictions of dependency length and surprisal are only moderately correlated, a finding which mirrors Dember & Keller's (2008) results for sentence comprehension. Further, we demonstrate that the efficacy of dependency length in predicting the corpus choice increases with increasing head-dependent distances. At the same time, we find that the tendency towards dependency locality is not always observed, and with pre-verbal adjuncts in particular, non-locality cases are found more often than not. In contrast, surprisal is effective in these cases, and the embedding depth measures further increase prediction accuracy. We discuss the implications of our findings for theories of language comprehension and production, and conclude with a discussion of questions our work raises for future research.
我们研究了书面英语中的句法选择在多大程度上受到加工因素的影响,这些因素是由吉布森(2000)的依存局部性理论(DLT)和意外值理论(黑尔,2001;利维,2008)所预测的。此前的一系列研究证明,语言表现出较短依存关系的倾向,并且在之前的语料库研究中,坦珀利(2007)提供了证据表明这种倾向对成分排序选择有强烈影响。然而,坦珀利的研究没有包括基于频率的控制,随后在广泛覆盖的眼动追踪语料库上进行的句子理解研究发现,在对频率效应进行统计控制时,基于DLT的测量结果显示出微弱或负面的影响(登贝格和凯勒,2008;范·施因德尔、阮和舒勒,2013;范·施因德尔和舒勒,2013),这使得人们对依存局部性对句法选择现象的实际影响产生了质疑。超越坦珀利的研究,我们表明,即使存在基于频率和认知动机的竞争控制因素,包括n元语法概率、概率上下文无关文法意外值以及嵌入深度(吴、巴赫拉赫、卡德纳斯和舒勒,2010;英格夫,1960),DLT整合成本确实是书面英语句法选择的一个重要预测指标。我们的研究还表明,依存长度和意外值的预测仅具有中等程度的相关性,这一发现与登贝格和凯勒(2008)关于句子理解的结果一致。此外,我们证明了依存长度在预测语料库选择方面的有效性随着中心语与依存成分之间距离的增加而提高。同时,我们发现并非总是能观察到依存局部性的倾向,特别是对于动词前的状语,非局部性情况往往更为常见。相比之下,意外值在这些情况下是有效的,并且嵌入深度测量进一步提高了预测准确性。我们讨论了我们的发现对语言理解和生成理论的影响,并以对我们的工作为未来研究提出的问题的讨论作为结论。