• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

比较大型多模态模型与人类的感知判断。

Comparing perceptual judgments in large multimodal models and humans.

作者信息

Dickson Billy, Maini Sahaj Singh, Sanders Craig, Nosofsky Robert, Tiganj Zoran

机构信息

Department of Computer Science, Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, 700 N Woodlawn Ave, Bloomington, IN, 47408, USA.

Department of Psychological and Brain Sciences, Indiana University Bloomington, Bloomington, IN, USA.

出版信息

Behav Res Methods. 2025 Jun 19;57(7):203. doi: 10.3758/s13428-025-02728-w.

DOI:10.3758/s13428-025-02728-w
PMID:40536604
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12178973/
Abstract

Cognitive scientists commonly collect participants' judgments regarding perceptual characteristics of stimuli to develop and evaluate models of attention, memory, learning, and decision-making. For instance, to model human responses in tasks of category learning and item recognition, researchers often collect perceptual judgments of images in order to embed the images in multidimensional feature spaces. This process is time-consuming and costly. Recent advancements in large multimodal models (LMMs) provide a potential alternative because such models can respond to prompts that include both text and images and could potentially replace human participants. To test whether the available LMMs can indeed be useful for this purpose, we evaluated their judgments on a dataset consisting of rock images that has been widely used by cognitive scientists. The dataset includes human perceptual judgments along 10 dimensions considered important for classifying rock images. Among the LMMs that we investigated, GPT-4o exhibited the strongest positive correlation with human responses and demonstrated promising alignment with the mean ratings from human participants, particularly for elementary dimensions such as lightness, chromaticity, shininess, and fine/coarse grain texture. However, its correlations with human ratings were lower for more abstract and rock-specific emergent dimensions such as organization and pegmatitic structure. Although there is room for further improvement, the model already appears to be approaching the level of consensus observed across human groups for the perceptual features examined here. Our study provides a benchmark for evaluating future LMMs on human perceptual judgment data.

摘要

认知科学家通常会收集参与者对刺激物感知特征的判断,以开发和评估注意力、记忆、学习和决策模型。例如,为了模拟人类在类别学习和项目识别任务中的反应,研究人员经常收集图像的感知判断,以便将图像嵌入多维特征空间。这个过程既耗时又昂贵。大型多模态模型(LMM)的最新进展提供了一种潜在的替代方案,因为这类模型可以响应包含文本和图像的提示,并且有可能取代人类参与者。为了测试现有的LMM是否确实能用于此目的,我们在一个由岩石图像组成的数据集上评估了它们的判断,该数据集已被认知科学家广泛使用。该数据集包括沿着对岩石图像分类很重要的10个维度的人类感知判断。在我们研究的LMM中,GPT-4o与人类反应表现出最强的正相关,并与人类参与者的平均评分显示出有希望的一致性,特别是对于亮度、色度、光泽度和细/粗粒度纹理等基本维度。然而,对于更抽象和特定于岩石的新兴维度,如组织结构和伟晶岩结构,它与人类评分的相关性较低。尽管还有进一步改进的空间,但该模型在此处检查的感知特征方面似乎已经接近人类群体中观察到的共识水平。我们的研究为评估未来LMM在人类感知判断数据上的表现提供了一个基准。

相似文献

1
Comparing perceptual judgments in large multimodal models and humans.比较大型多模态模型与人类的感知判断。
Behav Res Methods. 2025 Jun 19;57(7):203. doi: 10.3758/s13428-025-02728-w.
2
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
3
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.
4
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。
Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.
5
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
6
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
7
Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.
8
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
9
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
10
Psychological therapies for panic disorder with or without agoraphobia in adults: a network meta-analysis.成人伴或不伴有广场恐惧症的惊恐障碍的心理治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2016 Apr 13;4(4):CD011004. doi: 10.1002/14651858.CD011004.pub2.

本文引用的文献

1
Large language models predict human sensory judgments across six modalities.大型语言模型可预测人类在六种感觉模式下的判断。
Sci Rep. 2024 Sep 13;14(1):21445. doi: 10.1038/s41598-024-72071-1.
2
How to write effective prompts for large language models.如何为大语言模型编写有效的提示词。
Nat Hum Behav. 2024 Apr;8(4):611-615. doi: 10.1038/s41562-024-01847-2.
3
The universal law of generalization holds for naturalistic stimuli.普遍概括定律适用于自然刺激。
J Exp Psychol Gen. 2024 Mar;153(3):573-589. doi: 10.1037/xge0001533.
4
Testing formal cognitive models of classification and old-new recognition in a real-world high-dimensional category domain.在真实世界的高维类别领域中测试分类和新旧识别的正式认知模型。
Cogn Psychol. 2023 Sep;145:101596. doi: 10.1016/j.cogpsych.2023.101596. Epub 2023 Aug 30.
5
Modeling Similarity and Psychological Space.建模相似性和心理空间。
Annu Rev Psychol. 2024 Jan 18;75:215-240. doi: 10.1146/annurev-psych-040323-115131. Epub 2023 Aug 10.
6
Can AI language models replace human participants?人工智能语言模型能否替代人类参与者?
Trends Cogn Sci. 2023 Jul;27(7):597-600. doi: 10.1016/j.tics.2023.04.008. Epub 2023 May 10.
7
The features underlying the memorability of objects.物体记忆能力的特征。
Sci Adv. 2023 Apr 28;9(17):eadd2981. doi: 10.1126/sciadv.add2981. Epub 2023 Apr 26.
8
Contrasting exemplar and prototype models in a natural-science category domain.对比自然科学类别领域中的范例模型和原型模型。
J Exp Psychol Learn Mem Cogn. 2022 Dec;48(12):1970-1994. doi: 10.1037/xlm0001069. Epub 2022 May 26.
9
Emerging Grounded Shared Vocabularies Between Human and Machine, Inspired by Human Language Evolution.受人类语言进化启发,人类与机器之间正在形成基于实际应用的共享词汇。
Front Artif Intell. 2022 Apr 26;5:886349. doi: 10.3389/frai.2022.886349. eCollection 2022.
10
Improving Medical Image Decision-Making by Leveraging Metacognitive Processes and Representational Similarity.通过利用元认知过程和表征相似性来改善医学图像决策
Top Cogn Sci. 2022 Apr;14(2):400-413. doi: 10.1111/tops.12588. Epub 2021 Dec 5.