• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对ChatGPT视觉模型(GPT-4V)进行测试:交通图像中的风险感知。

Putting ChatGPT vision (GPT-4V) to the test: risk perception in traffic images.

作者信息

Driessen Tom, Dodou Dimitra, Bazilinskyy Pavlo, de Winter Joost

机构信息

Delft University of Technology, Delft, Zuid-Holland, The Netherlands.

Eindhoven University of Technology, Eindhoven, Noord-Brabant, The Netherlands.

出版信息

R Soc Open Sci. 2024 May 29;11(5):231676. doi: 10.1098/rsos.231676. eCollection 2024 May.

DOI:10.1098/rsos.231676
PMID:39076815
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11285896/
Abstract

Vision-language models are of interest in various domains, including automated driving, where computer vision techniques can accurately detect road users, but where the vehicle sometimes fails to understand context. This study examined the effectiveness of GPT-4V in predicting the level of 'risk' in traffic images as assessed by humans. We used 210 static images taken from a moving vehicle, each previously rated by approximately 650 people. Based on psychometric construct theory and using insights from the self-consistency prompting method, we formulated three hypotheses: (i) repeating the prompt under effectively identical conditions increases validity, (ii) varying the prompt text and extracting a total score increases validity compared to using a single prompt, and (iii) in a multiple regression analysis, the incorporation of object detection features, alongside the GPT-4V-based risk rating, significantly contributes to improving the model's validity. Validity was quantified by the correlation coefficient with human risk scores, across the 210 images. The results confirmed the three hypotheses. The eventual validity coefficient was = 0.83, indicating that population-level human risk can be predicted using AI with a high degree of accuracy. The findings suggest that GPT-4V must be prompted in a way equivalent to how humans fill out a multi-item questionnaire.

摘要

视觉语言模型在包括自动驾驶在内的各个领域都备受关注。在自动驾驶领域,计算机视觉技术能够准确检测道路使用者,但车辆有时难以理解上下文信息。本研究考察了GPT-4V在预测人类评估的交通图像“风险”水平方面的有效性。我们使用了从行驶车辆上拍摄的210张静态图像,每张图像之前都由大约650人进行了评分。基于心理测量建构理论并借鉴自一致性提示方法的见解,我们提出了三个假设:(i)在有效相同的条件下重复提示可提高有效性;(ii)与使用单个提示相比,改变提示文本并提取总分可提高有效性;(iii)在多元回归分析中,结合目标检测特征以及基于GPT-4V的风险评级,对提高模型的有效性有显著贡献。通过计算210张图像与人类风险评分的相关系数来量化有效性。结果证实了这三个假设。最终的有效性系数为 = 0.83,表明使用人工智能可以高度准确地预测总体水平的人类风险。研究结果表明,必须以类似于人类填写多项目问卷的方式对GPT-4V进行提示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/facbe4649e42/rsos231676f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/4dcfff76db4b/rsos231676f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/999a567fd6e1/rsos231676f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/69c2d9118ea6/rsos231676f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/b960c3a4b685/rsos231676f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/a2de40c5b362/rsos231676f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/facbe4649e42/rsos231676f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/4dcfff76db4b/rsos231676f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/999a567fd6e1/rsos231676f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/69c2d9118ea6/rsos231676f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/b960c3a4b685/rsos231676f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/a2de40c5b362/rsos231676f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/facbe4649e42/rsos231676f06.jpg

相似文献

1
Putting ChatGPT vision (GPT-4V) to the test: risk perception in traffic images.对ChatGPT视觉模型(GPT-4V)进行测试:交通图像中的风险感知。
R Soc Open Sci. 2024 May 29;11(5):231676. doi: 10.1098/rsos.231676. eCollection 2024 May.
2
Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation.使用GPT-4视觉进行中耳疾病分类的多模态人工智能的可行性:定性研究与验证
JMIR AI. 2024 May 31;3:e58342. doi: 10.2196/58342.
3
Performance of GPT-4 with Vision on Text- and Image-based ACR Diagnostic Radiology In-Training Examination Questions.GPT-4 在基于文本和图像的放射科住院医师诊断考试中的表现。
Radiology. 2024 Sep;312(3):e240153. doi: 10.1148/radiol.240153.
4
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.评估ChatGPT-4的诊断准确性:视觉数据整合的影响。
JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.
5
Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study.GPT-4V 在回答日本耳鼻喉科学委员会认证考试问题方面的表现:评估研究。
JMIR Med Educ. 2024 Mar 28;10:e57054. doi: 10.2196/57054.
6
Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.比较基于 GPT-4 的 ChatGPT、基于 GPT-4V 的 ChatGPT 和放射科医生在神经放射学挑战性病例中的诊断性能。
Clin Neuroradiol. 2024 Dec;34(4):779-787. doi: 10.1007/s00062-024-01426-y. Epub 2024 May 28.
7
ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology.与放射科医生在肌肉骨骼放射学中的诊断表现相比,基于文本与视觉信息的ChatGPT的诊断表现。
Eur Radiol. 2025 Jan;35(1):506-516. doi: 10.1007/s00330-024-10902-5. Epub 2024 Jul 12.
8
Performance of GPT-4 Vision on kidney pathology exam questions.GPT-4 视觉模型在肾脏病理考题上的表现。
Am J Clin Pathol. 2024 Sep 3;162(3):220-226. doi: 10.1093/ajcp/aqae030.
9
Evaluation of GPT Large Language Model Performance on RSNA 2023 Case of the Day Questions.评估 GPT 大语言模型在 RSNA 2023 每日病例问题上的表现。
Radiology. 2024 Oct;313(1):e240609. doi: 10.1148/radiol.240609.
10
A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?GPT-3.5、GPT-4和GPT-4V之间的比较:大型语言模型(ChatGPT)能通过日本骨科手术委员会考试吗?
Cureus. 2024 Mar 18;16(3):e56402. doi: 10.7759/cureus.56402. eCollection 2024 Mar.

引用本文的文献

1
Urban walkability through different lenses: A comparative study of GPT-4o and human perceptions.不同视角下的城市步行适宜性:GPT-4o与人类认知的比较研究
PLoS One. 2025 Apr 29;20(4):e0322078. doi: 10.1371/journal.pone.0322078. eCollection 2025.
2
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.评估ChatGPT-4的诊断准确性:视觉数据整合的影响。
JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.

本文引用的文献

1
Using ChatGPT for human-computer interaction research: a primer.使用ChatGPT进行人机交互研究:入门指南。
R Soc Open Sci. 2023 Sep 13;10(9):231053. doi: 10.1098/rsos.231053. eCollection 2023 Sep.
2
Urban visual intelligence: Uncovering hidden city profiles with street view images.城市视觉智能:利用街景图像揭示隐藏的城市特征。
Proc Natl Acad Sci U S A. 2023 Jul 4;120(27):e2220417120. doi: 10.1073/pnas.2220417120. Epub 2023 Jun 26.
3
Human-like driving behaviour emerges from a risk-based driver model.类人驾驶行为源于基于风险的驾驶员模型。
Nat Commun. 2020 Sep 29;11(1):4850. doi: 10.1038/s41467-020-18353-4.
4
Predicting human complexity perception of real-world scenes.预测人类对现实世界场景的复杂性感知。
R Soc Open Sci. 2020 May 13;7(5):191487. doi: 10.1098/rsos.191487. eCollection 2020 May.
5
What determines drivers' speed? A replication of three behavioural adaptation experiments in a single driving simulator study.是什么决定了驾驶员的速度?在一项单一驾驶模拟器研究中对三项行为适应性实验的复制。
Ergonomics. 2018 Jul;61(7):966-987. doi: 10.1080/00140139.2018.1426790. Epub 2018 Feb 5.
6
Computer vision uncovers predictors of physical urban change.计算机视觉揭示了物理城市变化的预测因素。
Proc Natl Acad Sci U S A. 2017 Jul 18;114(29):7571-7576. doi: 10.1073/pnas.1619003114. Epub 2017 Jul 6.
7
Why the items versus parcels controversy needn't be one.为何项目与包裹之争并非不可调和。
Psychol Methods. 2013 Sep;18(3):285-300. doi: 10.1037/a0033266. Epub 2013 Jul 8.
8
Using endemic road features to create self-explaining roads and reduce vehicle speeds.利用本地道路特征,创建自解释性道路,降低车辆速度。
Accid Anal Prev. 2010 Nov;42(6):1989-98. doi: 10.1016/j.aap.2010.06.006. Epub 2010 Jul 3.
9
Towards a general theory of driver behaviour.迈向驾驶员行为的通用理论。
Accid Anal Prev. 2005 May;37(3):461-72. doi: 10.1016/j.aap.2004.11.003.