Center for Mind Brain Sciences, University of Trento.
Cogn Sci. 2010 Mar;34(2):222-54. doi: 10.1111/j.1551-6709.2009.01068.x. Epub 2009 Sep 30.
Computational models of meaning trained on naturally occurring text successfully model human performance on tasks involving simple similarity measures, but they characterize meaning in terms of undifferentiated bags of words or topical dimensions. This has led some to question their psychological plausibility (Murphy, 2002;Schunn, 1999). We present here a fully automatic method for extracting a structured and comprehensive set of concept descriptions directly from an English part-of-speech-tagged corpus. Concepts are characterized by weighted properties, enriched with concept-property types that approximate classical relations such as hypernymy and function. Our model outperforms comparable algorithms in cognitive tasks pertaining not only to concept-internal structures (discovering properties of concepts, grouping properties by property type) but also to inter-concept relations (clustering into superordinates), suggesting the empirical validity of the property-based approach.
基于自然语言文本训练的意义计算模型在涉及简单相似性度量的任务中成功模拟了人类的表现,但它们将意义描述为不分青红皂白的词袋或主题维度。这使得一些人对其心理合理性产生了质疑(Murphy,2002;Schunn,1999)。我们在这里提出了一种从英语词性标注语料库中直接提取结构化和全面的概念描述的全自动方法。概念的特点是具有加权属性,并丰富了概念属性类型,这些类型近似于超类和功能等经典关系。我们的模型在认知任务中的表现优于可比算法,不仅涉及概念内部结构(发现概念的属性,按属性类型对属性进行分组),还涉及概念间关系(聚类为上级概念),这表明基于属性的方法具有经验有效性。