• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

不同视角下的城市步行适宜性:GPT-4o与人类认知的比较研究

Urban walkability through different lenses: A comparative study of GPT-4o and human perceptions.

作者信息

Wedyan Musab, Yeh Yu-Chen, Saeidi-Rizi Fatemeh, Peng Tai-Quan, Chang Chun-Yen

机构信息

School of Planning, Design and Construction, Michigan State University, East Lansing, Michigan, United States of America.

Department of Horticulture and Landscape Architecture, National Taiwan University, Taipei City, TaiwanTaiwan.

出版信息

PLoS One. 2025 Apr 29;20(4):e0322078. doi: 10.1371/journal.pone.0322078. eCollection 2025.

DOI:10.1371/journal.pone.0322078
PMID:40299853
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12040139/
Abstract

Urban environments significantly shape our well-being, behavior, and overall quality of life. Assessing urban environments, particularly walkability, has traditionally relied on computer vision and machine learning algorithms. However, these approaches often fail to capture the subjective and emotional dimensions of walkability, due to their limited ability to integrate human-centered perceptions and contextual understanding. Recently, large language models (LLMs) have gained traction for their ability to process and analyze unstructured data. With the increasing reliance on LLMs in urban studies, it is essential to critically evaluate their potential to accurately capture human perceptions of walkability and contribute to the design of more pedestrian-friendly environments. Therefore, a critical question arises: can large language models (LLMs), such as GPT-4o, accurately reflect human perceptions of urban environments? This study aims to address this question by comparing GPT-4o's evaluations of visual urban scenes with human perceptions, specifically in the context of urban walkability. The research involved human participants and GPT-4o evaluating street-level images based on key dimensions of walkability, including overall walkability, feasibility, accessibility, safety, comfort, and liveliness. To analyze the data, text mining techniques were employed, examining keyword frequency, coherence scores, and similarity indices between the participants and GPT-4o-generated responses. The findings revealed that GPT-4o and participants aligned in their evaluations of overall walkability, feasibility, accessibility, and safety. In contrast, notable differences emerged in the assessment of comfort and liveliness. Human participants demonstrated broader thematic diversity and addressed a wider range of topics, whereas GPT-4o had more focused and cohesive responses, particularly in relation to comfort and safety. In addition, similarity scores between GPT-4o and the responses of participants indicated a moderate level of alignment between GPT-4o's reasoning and human judgments. The study concludes that human input remains essential for fully capturing human-centered evaluations of walkability. Furthermore, it underscores the importance of refining LLMs to better align with human perceptions in future walkability studies.

摘要

城市环境显著影响着我们的幸福感、行为以及整体生活质量。传统上,评估城市环境,尤其是步行适宜性,依赖于计算机视觉和机器学习算法。然而,由于这些方法整合以人为本的认知和情境理解的能力有限,往往无法捕捉到步行适宜性的主观和情感维度。最近,大语言模型(LLMs)因其处理和分析非结构化数据的能力而受到关注。随着城市研究中对大语言模型的依赖日益增加,至关重要的是要批判性地评估它们准确捕捉人类对步行适宜性的认知并为设计更适合行人的环境做出贡献的潜力。因此,一个关键问题出现了:诸如GPT - 4o这样的大语言模型能否准确反映人类对城市环境的认知?本研究旨在通过将GPT - 4o对城市视觉场景的评估与人类认知进行比较来回答这个问题,特别是在城市步行适宜性的背景下。该研究让人类参与者和GPT - 4o根据步行适宜性的关键维度,包括整体步行适宜性、可行性、可达性、安全性、舒适度和活力,对街道级图像进行评估。为了分析数据,采用了文本挖掘技术,检查参与者与GPT - 4o生成的回答之间的关键词频率、连贯分数和相似性指数。研究结果表明,GPT - 4o和参与者在对整体步行适宜性、可行性、可达性和安全性的评估上是一致的。相比之下,在舒适度和活力的评估上出现了显著差异。人类参与者展示了更广泛的主题多样性并涉及了更广泛的话题,而GPT - 4o的回答更集中且连贯,特别是在舒适度和安全性方面。此外,GPT - 4o与参与者回答之间的相似性分数表明GPT - 4o的推理与人类判断之间存在中等程度的一致性。该研究得出结论,人类输入对于全面捕捉以人为本的步行适宜性评估仍然至关重要。此外,它强调了在未来的步行适宜性研究中改进大语言模型以更好地与人类认知保持一致的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fcb/12040139/7f0c41caa236/pone.0322078.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fcb/12040139/9b7343421250/pone.0322078.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fcb/12040139/dd8573d2910a/pone.0322078.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fcb/12040139/7f0c41caa236/pone.0322078.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fcb/12040139/9b7343421250/pone.0322078.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fcb/12040139/dd8573d2910a/pone.0322078.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fcb/12040139/7f0c41caa236/pone.0322078.g003.jpg

相似文献

1
Urban walkability through different lenses: A comparative study of GPT-4o and human perceptions.不同视角下的城市步行适宜性:GPT-4o与人类认知的比较研究
PLoS One. 2025 Apr 29;20(4):e0322078. doi: 10.1371/journal.pone.0322078. eCollection 2025.
2
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.
3
An Evaluation of the Performance of OpenAI-o1 and GPT-4o in the Japanese National Examination for Physical Therapists.OpenAI-o1和GPT-4o在日本物理治疗师国家考试中的表现评估
Cureus. 2025 Jan 6;17(1):e76989. doi: 10.7759/cureus.76989. eCollection 2025 Jan.
4
Assessing the accuracy and clinical utility of GPT-4O in abnormal blood cell morphology recognition.评估GPT-4O在异常血细胞形态识别中的准确性和临床效用。
Digit Health. 2024 Nov 5;10:20552076241298503. doi: 10.1177/20552076241298503. eCollection 2024 Jan-Dec.
5
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.使用GPT-4o从放射学诊断印象中提取肺栓塞诊断:大语言模型评估研究
JMIR Med Inform. 2025 Apr 9;13:e67706. doi: 10.2196/67706.
6
Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists.生成式预训练变换器4o(GPT-4o)用于解答欧洲放射学文凭(EDiR)基于文本的多项选择题:与放射科医生的对比研究
Insights Imaging. 2025 Mar 22;16(1):66. doi: 10.1186/s13244-025-01941-7.
7
Large language models as an academic resource for radiologists stepping into artificial intelligence research.大语言模型作为放射科医生涉足人工智能研究的学术资源。
Curr Probl Diagn Radiol. 2025 May-Jun;54(3):342-348. doi: 10.1067/j.cpradiol.2024.12.004. Epub 2024 Dec 10.
8
Diagnostic Performance of GPT-4o and Claude 3 Opus in Determining Causes of Death From Medical Histories and Postmortem CT Findings.GPT-4o和Claude 3 Opus根据病史和尸检CT结果确定死因的诊断性能
Cureus. 2024 Aug 20;16(8):e67306. doi: 10.7759/cureus.67306. eCollection 2024 Aug.
9
Capabilities of GPT-4o and Gemini 1.5 Pro in Gram stain and bacterial shape identification.GPT-4o 和 Gemini 1.5 Pro 在革兰氏染色和细菌形态识别方面的能力。
Future Microbiol. 2024;19(15):1283-1292. doi: 10.1080/17460913.2024.2381967. Epub 2024 Jul 29.
10
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.

本文引用的文献

1
Exploring perceived walkability in one-way commercial streets: An application of 360° immersive videos.探索单向商业街的可步行感知:360°沉浸式视频的应用
PLoS One. 2024 Dec 30;19(12):e0315828. doi: 10.1371/journal.pone.0315828. eCollection 2024.
2
LLM-Enhanced multimodal detection of fake news.基于 LLM 的多模态虚假新闻检测。
PLoS One. 2024 Oct 24;19(10):e0312240. doi: 10.1371/journal.pone.0312240. eCollection 2024.
3
Assessing GPT-4 multimodal performance in radiological image analysis.评估GPT-4在放射图像分析中的多模态性能。
Eur Radiol. 2025 Apr;35(4):1959-1965. doi: 10.1007/s00330-024-11035-5. Epub 2024 Aug 30.
4
Putting ChatGPT vision (GPT-4V) to the test: risk perception in traffic images.对ChatGPT视觉模型(GPT-4V)进行测试:交通图像中的风险感知。
R Soc Open Sci. 2024 May 29;11(5):231676. doi: 10.1098/rsos.231676. eCollection 2024 May.
5
How funny is ChatGPT? A comparison of human- and A.I.-produced jokes.ChatGPT 有多搞笑?人类和人工智能生成笑话的比较。
PLoS One. 2024 Jul 3;19(7):e0305364. doi: 10.1371/journal.pone.0305364. eCollection 2024.
6
AE-GPT: Using Large Language Models to extract adverse events from surveillance reports-A use case with influenza vaccine adverse events.AE-GPT:利用大语言模型从监测报告中提取不良事件——以流感疫苗不良事件为例。
PLoS One. 2024 Mar 21;19(3):e0300919. doi: 10.1371/journal.pone.0300919. eCollection 2024.
7
Large language models are able to downplay their cognitive abilities to fit the persona they simulate.大型语言模型能够淡化其认知能力,以适应其模拟的角色。
PLoS One. 2024 Mar 13;19(3):e0298522. doi: 10.1371/journal.pone.0298522. eCollection 2024.
8
Assessing the Impact of Urban Environments on Mental Health and Perception Using Deep Learning: A Review and Text Mining Analysis.利用深度学习评估城市环境对心理健康和感知的影响:综述与文本挖掘分析。
J Urban Health. 2024 Apr;101(2):327-343. doi: 10.1007/s11524-024-00830-6. Epub 2024 Mar 11.
9
Using ChatGPT for human-computer interaction research: a primer.使用ChatGPT进行人机交互研究:入门指南。
R Soc Open Sci. 2023 Sep 13;10(9):231053. doi: 10.1098/rsos.231053. eCollection 2023 Sep.
10
Urban visual intelligence: Uncovering hidden city profiles with street view images.城市视觉智能:利用街景图像揭示隐藏的城市特征。
Proc Natl Acad Sci U S A. 2023 Jul 4;120(27):e2220417120. doi: 10.1073/pnas.2220417120. Epub 2023 Jun 26.