Suppr超能文献

使用大语言模型估计多词表达的特征:具体性、效价、唤醒度。

Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal.

作者信息

Martínez Gonzalo, Molero Juan Diego, González Sandra, Conde Javier, Brysbaert Marc, Reviriego Pedro

机构信息

Universidad Carlos III de Madrid, Madrid, Spain.

ETSI de Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain.

出版信息

Behav Res Methods. 2024 Dec 4;57(1):5. doi: 10.3758/s13428-024-02515-z.

Abstract

This study investigates the potential of large language models (LLMs) to provide accurate estimates of concreteness, valence, and arousal for multi-word expressions. Unlike previous artificial intelligence (AI) methods, LLMs can capture the nuanced meanings of multi-word expressions. We systematically evaluated GPT-4o's ability to predict concreteness, valence, and arousal. In Study 1, GPT-4o showed strong correlations with human concreteness ratings (r = .8) for multi-word expressions. In Study 2, these findings were repeated for valence and arousal ratings of individual words, matching or outperforming previous AI models. Studies 3-5 extended the valence and arousal analysis to multi-word expressions and showed good validity of the LLM-generated estimates for these stimuli as well. To help researchers with stimulus selection, we provide datasets with LLM-generated norms of concreteness, valence, and arousal for 126,397 English single words and 63,680 multi-word expressions.

摘要

本研究探讨了大语言模型(LLMs)为多词表达提供具体性、效价和唤醒度准确估计值的潜力。与以往的人工智能(AI)方法不同,大语言模型能够捕捉多词表达的细微含义。我们系统地评估了GPT-4o预测具体性、效价和唤醒度的能力。在研究1中,GPT-4o在多词表达的具体性评级方面与人类评级表现出很强的相关性(r = 0.8)。在研究2中,针对单个单词的效价和唤醒度评级重复了这些发现,与之前的人工智能模型相当或更优。研究3至5将效价和唤醒度分析扩展到多词表达,并表明大语言模型生成的这些刺激估计值也具有良好的效度。为帮助研究人员进行刺激选择,我们提供了包含大语言模型生成的126397个英语单字和63680个多词表达的具体性、效价和唤醒度规范的数据集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验