Department of Linguistics, University of Michigan, Ann Arbor, MI 48109-1220, USA.
Wiley Interdiscip Rev Cogn Sci. 2011 May;2(3):315-322. doi: 10.1002/wcs.111. Epub 2010 Sep 22.
The term statistical methods here refers to a methodology that has been dominant in computational linguistics since about 1990. It is characterized by the use of stochastic models, substantial data sets, machine learning, and rigorous experimental evaluation. The shift to statistical methods in computational linguistics parallels a movement in artificial intelligence more broadly. Statistical methods have so thoroughly permeated computational linguistics that almost all work in the field draws on them in some way. There has, however, been little penetration of the methods into general linguistics. The methods themselves are largely borrowed from machine learning and information theory. We limit attention to that which has direct applicability to language processing, though the methods are quite general and have many nonlinguistic applications. Not every use of statistics in language processing falls under statistical methods as we use the term. Standard hypothesis testing and experimental design, for example, are not covered in this article. WIREs Cogni Sci 2011 2 315-322 DOI: 10.1002/wcs.111 For further resources related to this article, please visit the WIREs website.
这里的统计方法是指自 1990 年左右以来在计算语言学中占主导地位的一种方法。其特点是使用随机模型、大量数据集、机器学习和严格的实验评估。计算语言学中向统计方法的转变与人工智能的更广泛转变是并行的。统计方法已经如此彻底地渗透到计算语言学中,以至于该领域的几乎所有工作都以某种方式借鉴了它们。然而,这些方法在普通语言学中的渗透却很少。这些方法本身主要是从机器学习和信息论中借鉴而来。我们将注意力限制在与语言处理直接相关的方面,尽管这些方法非常通用,并且有许多非语言应用。并非语言处理中每一次使用统计数据都属于我们使用的统计方法。例如,标准的假设检验和实验设计不在本文讨论范围之内。WIREs Cogni Sci 2011 2 315-322 DOI: 10.1002/wcs.111 有关本文的其他资源,请访问 WIREs 网站。