• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

年龄与大语言模型对认知障碍的机器易感性:横断面分析

Age against the machine-susceptibility of large language models to cognitive impairment: cross sectional analysis.

作者信息

Dayan Roy, Uliel Benjamin, Koplewitz Gal

机构信息

Department of Neurology, Hadassah Medical Center, Jerusalem, Israel.

Faculty of Medicine, Hebrew University, Jerusalem, Israel.

出版信息

BMJ. 2024 Dec 19;387:e081948. doi: 10.1136/bmj-2024-081948.

DOI:10.1136/bmj-2024-081948
PMID:39706600
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12128858/
Abstract

OBJECTIVE

To evaluate the cognitive abilities of the leading large language models and identify their susceptibility to cognitive impairment, using the Montreal Cognitive Assessment (MoCA) and additional tests.

DESIGN

Cross sectional analysis.

SETTING

Online interaction with large language models via text based prompts.

PARTICIPANTS

Publicly available large language models, or "chatbots": ChatGPT versions 4 and 4o (developed by OpenAI), Claude 3.5 "Sonnet" (developed by Anthropic), and Gemini versions 1 and 1.5 (developed by Alphabet).

ASSESSMENTS

The MoCA test (version 8.1) was administered to the leading large language models with instructions identical to those given to human patients. Scoring followed official guidelines and was evaluated by a practising neurologist. Additional assessments included the Navon figure, cookie theft picture, Poppelreuter figure, and Stroop test.

MAIN OUTCOME MEASURES

MoCA scores, performance in visuospatial/executive tasks, and Stroop test results.

RESULTS

ChatGPT 4o achieved the highest score on the MoCA test (26/30), followed by ChatGPT 4 and Claude (25/30), with Gemini 1.0 scoring lowest (16/30). All large language models showed poor performance in visuospatial/executive tasks. Gemini models failed at the delayed recall task. Only ChatGPT 4o succeeded in the incongruent stage of the Stroop test.

CONCLUSIONS

With the exception of ChatGPT 4o, almost all large language models subjected to the MoCA test showed signs of mild cognitive impairment. Moreover, as in humans, age is a key determinant of cognitive decline: "older" chatbots, like older patients, tend to perform worse on the MoCA test. These findings challenge the assumption that artificial intelligence will soon replace human doctors, as the cognitive impairment evident in leading chatbots may affect their reliability in medical diagnostics and undermine patients' confidence.

摘要

目的

使用蒙特利尔认知评估量表(MoCA)及其他测试,评估主流大语言模型的认知能力,并确定它们对认知障碍的易感性。

设计

横断面分析。

设置

通过基于文本的提示与大语言模型进行在线交互。

参与者

公开可用的大语言模型,即“聊天机器人”:ChatGPT版本4和4o(由OpenAI开发)、Claude 3.5“十四行诗”(由Anthropic开发)以及Gemini版本1和1.5(由Alphabet开发)。

评估

向主流大语言模型施测MoCA测试(8.1版),其指导语与给予人类患者的相同。评分遵循官方指南,并由一名执业神经科医生进行评估。额外评估包括纳冯图形、画钟试验、波普洛依特图形以及斯特鲁普测试。

主要观察指标

MoCA分数、视觉空间/执行任务表现以及斯特鲁普测试结果。

结果

ChatGPT 4o在MoCA测试中得分最高(26/30),其次是ChatGPT 4和Claude(25/30),Gemini 1.0得分最低(16/30)。所有大语言模型在视觉空间/执行任务中表现不佳。Gemini模型在延迟回忆任务中失败。只有ChatGPT 4o在斯特鲁普测试的不一致阶段成功完成。

结论

除ChatGPT 4o外,几乎所有接受MoCA测试的大语言模型都表现出轻度认知障碍的迹象。此外,与人类一样,年龄是认知衰退的关键决定因素:“较老”的聊天机器人,就像老年患者一样,在MoCA测试中往往表现更差。这些发现挑战了人工智能将很快取代人类医生的假设,因为主流聊天机器人中明显的认知障碍可能会影响它们在医学诊断中的可靠性,并削弱患者的信心。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea1c/12128858/f5c841a8500b/dayr081948.f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea1c/12128858/e35852375e1d/dayr081948.f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea1c/12128858/0756dd07c260/dayr081948.f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea1c/12128858/f5c841a8500b/dayr081948.f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea1c/12128858/e35852375e1d/dayr081948.f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea1c/12128858/0756dd07c260/dayr081948.f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea1c/12128858/f5c841a8500b/dayr081948.f3.jpg

相似文献

1
Age against the machine-susceptibility of large language models to cognitive impairment: cross sectional analysis.年龄与大语言模型对认知障碍的机器易感性:横断面分析
BMJ. 2024 Dec 19;387:e081948. doi: 10.1136/bmj-2024-081948.
2
Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。
Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.
3
Anti-saccade can be used as a screening tool for early cognitive impairment: a correlation study based on anti-saccade parameters and cognitive function.反扫视可作为早期认知障碍的筛查工具:一项基于反扫视参数与认知功能的相关性研究。
Psychogeriatrics. 2025 Jan;25(1):e13215. doi: 10.1111/psyg.13215. Epub 2024 Nov 8.
4
Validity of the MoCA as a cognitive screening tool in epilepsy: Are there implications for global care and research?MoCA 作为一种认知筛查工具在癫痫中的有效性:这对全球的护理和研究有何影响?
Epilepsia Open. 2024 Aug;9(4):1526-1537. doi: 10.1002/epi4.12991. Epub 2024 Jun 14.
5
Information from digital and human sources: A comparison of chatbot and clinician responses to orthodontic questions.来自数字和人工来源的信息:聊天机器人与临床医生对正畸问题回答的比较。
Am J Orthod Dentofacial Orthop. 2025 May 6. doi: 10.1016/j.ajodo.2025.04.008.
6
Developing a Brief Neuropsychological Battery for Early Diagnosis of Cognitive Impairment.开发一种用于早期认知障碍诊断的简短神经心理学测试组合。
J Am Med Dir Assoc. 2019 Aug;20(8):1054.e11-1054.e20. doi: 10.1016/j.jamda.2019.02.028. Epub 2019 Apr 13.
7
Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.评估大型语言模型在放射学中的参考准确性:一项跨亚专业的比较研究。
Diagn Interv Radiol. 2025 May 12. doi: 10.4274/dir.2025.253101.
8
Population-Based Norms for the Montreal Cognitive Assessment in Arab Adults.阿拉伯成年人蒙特利尔认知评估的基于人群的常模
Brain Behav. 2025 Feb;15(2):e70287. doi: 10.1002/brb3.70287.
9
Benchmarking LLM chatbots' oncological knowledge with the Turkish Society of Medical Oncology's annual board examination questions.用土耳其医学肿瘤学会年度委员会考试问题对大型语言模型聊天机器人的肿瘤学知识进行基准测试。
BMC Cancer. 2025 Feb 4;25(1):197. doi: 10.1186/s12885-025-13596-0.
10
Textual Proficiency and Visual Deficiency: A Comparative Study of Large Language Models and Radiologists in MRI Artifact Detection and Correction.文本能力与视觉缺陷:大语言模型与放射科医生在MRI伪影检测与校正方面的比较研究
Acad Radiol. 2025 May;32(5):2411-2421. doi: 10.1016/j.acra.2025.01.004. Epub 2025 Feb 11.

引用本文的文献

1
"The Machine Will See You Now": A Clinician's Perspective on Artificial "Intelligence" In Clinical Care.“机器现在将为您服务”:临床医生对临床护理中人工智能的看法。
Mov Disord Clin Pract. 2025 May;12(5):588-591. doi: 10.1002/mdc3.70054. Epub 2025 Mar 20.
2
AI, universal basic income, and power: symbolic violence in the tech elite's narrative.人工智能、普遍基本收入与权力:科技精英叙事中的象征性暴力
Front Artif Intell. 2025 Feb 24;8:1488457. doi: 10.3389/frai.2025.1488457. eCollection 2025.

本文引用的文献

1
Evaluation and mitigation of the limitations of large language models in clinical decision-making.评估和缓解大型语言模型在临床决策中的局限性。
Nat Med. 2024 Sep;30(9):2613-2622. doi: 10.1038/s41591-024-03097-1. Epub 2024 Jul 4.
2
ChatGPT performance on the American Shoulder and Elbow Surgeons maintenance of certification exam.ChatGPT 在美肩肘外科医生认证考试维护部分的表现。
J Shoulder Elbow Surg. 2024 Sep;33(9):1888-1893. doi: 10.1016/j.jse.2024.02.029. Epub 2024 Apr 4.
3
Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.
大型语言模型防范生成健康类虚假信息的现行保障措施、风险缓解措施和透明度措施:重复横断面分析。
BMJ. 2024 Mar 20;384:e078538. doi: 10.1136/bmj-2023-078538.
4
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。
Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.
5
Assessment of ChatGPT's performance on neurology written board examination questions.ChatGPT在神经病学笔试问题上的表现评估。
BMJ Neurol Open. 2023 Nov 2;5(2):e000530. doi: 10.1136/bmjno-2023-000530. eCollection 2023.
6
Fabrication and errors in the bibliographic citations generated by ChatGPT.ChatGPT生成的文献引用中的编造与错误。
Sci Rep. 2023 Sep 7;13(1):14045. doi: 10.1038/s41598-023-41032-5.
7
Can ChatGPT pass the thoracic surgery exam?ChatGPT 能通过胸外科考试吗?
Am J Med Sci. 2023 Oct;366(4):291-295. doi: 10.1016/j.amjms.2023.08.001. Epub 2023 Aug 6.
8
Can ChatGPT pass the "Iranian Endodontics Specialist Board" exam?ChatGPT能通过“伊朗牙髓病专科委员会”的考试吗?
Iran Endod J. 2023;18(3):192.
9
ChatGPT failed Taiwan's Family Medicine Board Exam.ChatGPT 未能通过台湾家庭医学专科医师甄试。
J Chin Med Assoc. 2023 Aug 1;86(8):762-766. doi: 10.1097/JCMA.0000000000000946. Epub 2023 Jun 9.
10
ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story?ChatGPT参加欧洲核心心脏病学考试:一个人工智能的成功故事?
Eur Heart J Digit Health. 2023 Apr 24;4(3):279-281. doi: 10.1093/ehjdh/ztad029. eCollection 2023 May.