Steyvers Mark
Department of Cognitive Sciences, University of California, Irvine, 92697-5100, USA.
Acta Psychol (Amst). 2010 Mar;133(3):234-43. doi: 10.1016/j.actpsy.2009.10.010. Epub 2009 Nov 30.
Many psychological theories of semantic cognition assume that concepts are represented by features. The empirical procedures used to elicit features from humans rely on explicit human judgments which limit the scope of such representations. An alternative computational framework for semantic cognition that does not rely on explicit human judgment is based on the statistical analysis of large text collections. In the topic modeling approach, documents are represented as a mixture of learned topics where each topic is represented as a probability distribution over words. We propose feature-topic models, where each document is represented by a mixture of learned topics as well as predefined topics that are derived from feature norms. Results indicate that this model leads to systematic improvements in generalization tasks. We show that the learned topics in the model play in an important role in the generalization performance by including words that are not part of current feature norms.
许多语义认知的心理学理论都假定概念是由特征来表征的。用于从人类身上引出特征的实证程序依赖于人类的明确判断,这限制了此类表征的范围。一种不依赖于人类明确判断的语义认知计算框架是基于对大型文本集的统计分析。在主题建模方法中,文档被表示为一组学习到的主题的混合,其中每个主题被表示为单词上的概率分布。我们提出了特征 - 主题模型,其中每个文档由一组学习到的主题以及从特征规范派生的预定义主题的混合来表示。结果表明,该模型在泛化任务中带来了系统性的改进。我们表明,模型中学习到的主题通过纳入当前特征规范中未包含的单词,在泛化性能中发挥着重要作用。