Andrews Mark, Vigliocco Gabriella
Department of Cognitive, Perceptual, and Brain Sciences, Division of Psychology and Language Sciences, University College London.
Top Cogn Sci. 2010 Jan;2(1):101-13. doi: 10.1111/j.1756-8765.2009.01074.x.
In this paper, we describe a model that learns semantic representations from the distributional statistics of language. This model, however, goes beyond the common bag-of-words paradigm, and infers semantic representations by taking into account the inherent sequential nature of linguistic data. The model we describe, which we refer to as a Hidden Markov Topics model, is a natural extension of the current state of the art in Bayesian bag-of-words models, that is, the Topics model of Griffiths, Steyvers, and Tenenbaum (2007), preserving its strengths while extending its scope to incorporate more fine-grained linguistic information.
在本文中,我们描述了一种从语言的分布统计中学习语义表示的模型。然而,该模型超越了常见的词袋范式,通过考虑语言数据固有的顺序性质来推断语义表示。我们所描述的模型,称为隐马尔可夫主题模型,是贝叶斯词袋模型当前技术水平的自然扩展,即格里菲思、斯泰弗斯和特南鲍姆(2007年)的主题模型,在保留其优势的同时扩展其范围以纳入更细粒度的语言信息。