Department of Psychology, Stockholm University.
Cogn Sci. 2022 Nov;46(11):e13205. doi: 10.1111/cogs.13205.
The vocabulary for describing odors in English natural language is not well understood, as prior studies of odor descriptions have often relied on preselected descriptors and odor ratings. Here, we present a data-driven approach that automatically identifies English odor descriptors based on their degree of olfactory association, and derive their semantic organization from their distributions in natural texts, using a distributional-semantic language model. We identify 243 descriptors that are much more strongly associated with olfaction than English words in general. We then derive the semantic organization of these olfactory descriptors, and find that it is captured by four clusters that we name Offensive, Malodorous, Fragrant, and Edible. The semantic space derived from our model primarily differentiates descriptors in terms of pleasantness and edibility along which our four clusters are positioned, and is similar to a space derived from perceptual data. The semantic organization of odor vocabulary can thus be mapped using natural language data (e.g., online text), without the limitations of odor-perceptual data and preselected descriptors. Our method may thus facilitate research on olfaction, a sensory system known to often elude verbal description.
英文自然语言中描述气味的词汇理解得还不够透彻,因为之前对气味描述的研究往往依赖于预先选择的描述词和气味评级。在这里,我们提出了一种数据驱动的方法,该方法基于气味的关联程度自动识别英语气味描述词,并从自然文本中的分布中得出它们的语义组织,使用的是分布语义语言模型。我们识别出了 243 个描述词,它们与嗅觉的关联度远高于英语单词。然后,我们推导出这些嗅觉描述词的语义组织,发现它由四个簇组成,我们分别命名为难闻、恶臭、芳香和可食用。我们的模型推导出来的语义空间主要是根据描述词的愉悦度和可食用度来区分的,我们的四个簇就是沿着这个维度定位的,与基于感知数据推导出来的空间相似。因此,气味词汇的语义组织可以使用自然语言数据(例如在线文本)来映射,而无需受到气味感知数据和预先选择的描述词的限制。我们的方法可以促进对嗅觉的研究,嗅觉是一个众所周知的感官系统,常常难以用言语来描述。