• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GPT-4与人类的文本理解能力对比。

Text understanding in GPT-4 versus humans.

作者信息

Shultz Thomas R, Wise Jamie M, Nobandegani Ardavan S

机构信息

Department of Psychology, McGill University, Montreal, Canada.

School of Computer Science, McGill University, Montreal, Canada.

出版信息

R Soc Open Sci. 2025 Feb 20;12(2):241313. doi: 10.1098/rsos.241313. eCollection 2025 Feb.

DOI:10.1098/rsos.241313
PMID:39980841
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11840437/
Abstract

We examine whether a leading AI system, GPT-4, understands text as well as humans do, first using a well-established standardized test of discourse comprehension. On this test, GPT-4 performs slightly, but not statistically significantly, better than humans given the very high level of human performance. Both GPT-4 and humans make correct inferences about information that is not explicitly stated in the text, a critical test of understanding. Next, we use more difficult passages to determine whether that could allow larger differences between GPT-4 and humans. GPT-4 does considerably better on this more difficult text than do the high school and university students for whom these the text passages are designed, as admission tests of student reading comprehension. Deeper exploration of GPT-4's performance on material from one of these admission tests reveals generally accepted signatures of genuine understanding, namely generalization and inference.

摘要

我们首先使用一种成熟的话语理解标准化测试,来检验领先的人工智能系统GPT-4是否能像人类一样理解文本。在这项测试中,鉴于人类的表现水平非常高,GPT-4的表现略好于人类,但在统计学上并无显著差异。GPT-4和人类都能对文本中未明确表述的信息做出正确推断,这是理解的关键测试。接下来,我们使用更难的段落来确定这是否会使GPT-4和人类之间产生更大差异。作为学生阅读理解入学测试的这些文本段落,是为高中生和大学生设计的,而GPT-4在这些更难的文本上的表现比他们要好得多。对GPT-4在其中一项入学测试材料上的表现进行更深入的探究,揭示了真正理解的普遍认可的特征,即泛化和推理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8181/11840437/6d880ccf85be/rsos.241313.f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8181/11840437/08d18b535c58/rsos.241313.f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8181/11840437/6d880ccf85be/rsos.241313.f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8181/11840437/08d18b535c58/rsos.241313.f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8181/11840437/6d880ccf85be/rsos.241313.f002.jpg

相似文献

1
Text understanding in GPT-4 versus humans.GPT-4与人类的文本理解能力对比。
R Soc Open Sci. 2025 Feb 20;12(2):241313. doi: 10.1098/rsos.241313. eCollection 2025 Feb.
2
Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study.生成式语言模型在个性化医疗信息方面的评估:工具验证研究
JMIR AI. 2024 Aug 13;3:e54371. doi: 10.2196/54371.
3
Development of a GPT-4-Powered Virtual Simulated Patient and Communication Training Platform for Medical Students to Practice Discussing Abnormal Mammogram Results With Patients: Multiphase Study.开发一个由GPT-4驱动的虚拟模拟患者和沟通训练平台,供医学生练习与患者讨论异常乳房X光检查结果:多阶段研究。
JMIR Form Res. 2025 Apr 17;9:e65670. doi: 10.2196/65670.
4
Assessing readability of explanations and reliability of answers by GPT-3.5 and GPT-4 in non-traumatic spinal cord injury education.评估GPT-3.5和GPT-4在非创伤性脊髓损伤教育中所提供解释的可读性及答案的可靠性。
Med Teach. 2025 Jan 20:1-8. doi: 10.1080/0142159X.2024.2430365.
5
GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。
World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.
6
Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages.大型语言模型能否在系统评价中取代人类?评估 GPT-4 从多种语言的同行评议文献和灰色文献中进行筛选和提取数据的效果。
Res Synth Methods. 2024 Jul;15(4):616-626. doi: 10.1002/jrsm.1715. Epub 2024 Mar 14.
7
The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展:GPT-4 在骨科手术委员会问题上的表现。
Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.
8
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
9
Multimodal Performance of GPT-4 in Complex Ophthalmology Cases.GPT-4在复杂眼科病例中的多模态表现。
J Pers Med. 2025 Apr 21;15(4):160. doi: 10.3390/jpm15040160.
10
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

本文引用的文献

1
Generating meaning: active inference and the scope and limits of passive AI.生成意义:主动推理与被动人工智能的范围及局限
Trends Cogn Sci. 2024 Feb;28(2):97-112. doi: 10.1016/j.tics.2023.10.002. Epub 2023 Nov 15.
2
Language models and psychological sciences.语言模型与心理科学。
Front Psychol. 2023 Oct 20;14:1279317. doi: 10.3389/fpsyg.2023.1279317. eCollection 2023.
3
Emergent analogical reasoning in large language models.大语言模型中的紧急类比推理。
Nat Hum Behav. 2023 Sep;7(9):1526-1541. doi: 10.1038/s41562-023-01659-w. Epub 2023 Jul 31.
4
Using cognitive psychology to understand GPT-3.利用认知心理学理解 GPT-3。
Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2218523120. doi: 10.1073/pnas.2218523120. Epub 2023 Feb 2.
5
Action-effect anticipation in infant action control.婴儿动作控制中的动作-效果预期
Psychol Res. 2008 Mar;72(2):203-10. doi: 10.1007/s00426-006-0101-3. Epub 2006 Nov 9.
6
Constructing inferences during narrative text comprehension.在叙事文本理解过程中构建推理。
Psychol Rev. 1994 Jul;101(3):371-95. doi: 10.1037/0033-295x.101.3.371.
7
Conjugate reinforcement of infant exploratory behavior.婴儿探索行为的共轭强化
J Exp Child Psychol. 1969 Aug;8(1):33-9. doi: 10.1016/0022-0965(69)90025-3.