Suppr超能文献

对ChatGPT视觉模型(GPT-4V)进行测试:交通图像中的风险感知。

Putting ChatGPT vision (GPT-4V) to the test: risk perception in traffic images.

作者信息

Driessen Tom, Dodou Dimitra, Bazilinskyy Pavlo, de Winter Joost

机构信息

Delft University of Technology, Delft, Zuid-Holland, The Netherlands.

Eindhoven University of Technology, Eindhoven, Noord-Brabant, The Netherlands.

出版信息

R Soc Open Sci. 2024 May 29;11(5):231676. doi: 10.1098/rsos.231676. eCollection 2024 May.

Abstract

Vision-language models are of interest in various domains, including automated driving, where computer vision techniques can accurately detect road users, but where the vehicle sometimes fails to understand context. This study examined the effectiveness of GPT-4V in predicting the level of 'risk' in traffic images as assessed by humans. We used 210 static images taken from a moving vehicle, each previously rated by approximately 650 people. Based on psychometric construct theory and using insights from the self-consistency prompting method, we formulated three hypotheses: (i) repeating the prompt under effectively identical conditions increases validity, (ii) varying the prompt text and extracting a total score increases validity compared to using a single prompt, and (iii) in a multiple regression analysis, the incorporation of object detection features, alongside the GPT-4V-based risk rating, significantly contributes to improving the model's validity. Validity was quantified by the correlation coefficient with human risk scores, across the 210 images. The results confirmed the three hypotheses. The eventual validity coefficient was = 0.83, indicating that population-level human risk can be predicted using AI with a high degree of accuracy. The findings suggest that GPT-4V must be prompted in a way equivalent to how humans fill out a multi-item questionnaire.

摘要

视觉语言模型在包括自动驾驶在内的各个领域都备受关注。在自动驾驶领域,计算机视觉技术能够准确检测道路使用者,但车辆有时难以理解上下文信息。本研究考察了GPT-4V在预测人类评估的交通图像“风险”水平方面的有效性。我们使用了从行驶车辆上拍摄的210张静态图像,每张图像之前都由大约650人进行了评分。基于心理测量建构理论并借鉴自一致性提示方法的见解,我们提出了三个假设:(i)在有效相同的条件下重复提示可提高有效性;(ii)与使用单个提示相比,改变提示文本并提取总分可提高有效性;(iii)在多元回归分析中,结合目标检测特征以及基于GPT-4V的风险评级,对提高模型的有效性有显著贡献。通过计算210张图像与人类风险评分的相关系数来量化有效性。结果证实了这三个假设。最终的有效性系数为 = 0.83,表明使用人工智能可以高度准确地预测总体水平的人类风险。研究结果表明,必须以类似于人类填写多项目问卷的方式对GPT-4V进行提示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bf1/11285896/4dcfff76db4b/rsos231676f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验