• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估认知表现:传统方法与ChatGPT对比

Evaluating cognitive performance: Traditional methods vs. ChatGPT.

作者信息

Fei Xiao, Tang Ying, Zhang Jianan, Zhou Zhongkai, Yamamoto Ikuo, Zhang Yi

机构信息

Department of Rehabilitation Medicine, The First People's Hospital of Changzhou, Changzhou, China.

College of Information Science and Engineering, Hohai University, Changzhou, China.

出版信息

Digit Health. 2024 Aug 16;10:20552076241264639. doi: 10.1177/20552076241264639. eCollection 2024 Jan-Dec.

DOI:10.1177/20552076241264639
PMID:39156049
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11329975/
Abstract

BACKGROUND

NLP models like ChatGPT promise to revolutionize text-based content delivery, particularly in medicine. Yet, doubts remain about ChatGPT's ability to reliably support evaluations of cognitive performance, warranting further investigation into its accuracy and comprehensiveness in this area.

METHOD

A cohort of 60 cognitively normal individuals and 30 stroke survivors underwent a comprehensive evaluation, covering memory, numerical processing, verbal fluency, and abstract thinking. Healthcare professionals and NLP models GPT-3.5 and GPT-4 conducted evaluations following established standards. Scores were compared, and efforts were made to refine scoring protocols and interaction methods to enhance ChatGPT's potential in these evaluations.

RESULT

Within the cohort of healthy participants, the utilization of GPT-3.5 revealed significant disparities in memory evaluation compared to both physician-led assessments and those conducted utilizing GPT-4 ( < 0.001). Furthermore, within the domain of memory evaluation, GPT-3.5 exhibited discrepancies in 8 out of 21 specific measures when compared to assessments conducted by physicians ( < 0.05). Additionally, GPT-3.5 demonstrated statistically significant deviations from physician assessments in speech evaluation ( = 0.009). Among participants with a history of stroke, GPT-3.5 exhibited differences solely in verbal assessment compared to physician-led evaluations ( = 0.002). Notably, through the implementation of optimized scoring methodologies and refinement of interaction protocols, partial mitigation of these disparities was achieved.

CONCLUSION

ChatGPT can produce evaluation outcomes comparable to traditional methods. Despite differences from physician evaluations, refinement of scoring algorithms and interaction protocols has improved alignment. ChatGPT performs well even in populations with specific conditions like stroke, suggesting its versatility. GPT-4 yields results closer to physician ratings, indicating potential for further enhancement. These findings highlight ChatGPT's importance as a supplementary tool, offering new avenues for information gathering in medical fields and guiding its ongoing development and application.

摘要

背景

像ChatGPT这样的自然语言处理模型有望彻底改变基于文本的内容交付方式,尤其是在医学领域。然而,对于ChatGPT能否可靠地支持认知表现评估仍存在疑虑,因此有必要进一步研究其在该领域的准确性和全面性。

方法

对60名认知正常的个体和30名中风幸存者进行了全面评估,涵盖记忆、数字处理、语言流畅性和抽象思维。医疗保健专业人员以及自然语言处理模型GPT-3.5和GPT-4按照既定标准进行评估。比较了分数,并努力完善评分方案和交互方法,以增强ChatGPT在这些评估中的潜力。

结果

在健康参与者队列中,与医生主导的评估和使用GPT-4进行的评估相比,使用GPT-3.5进行记忆评估时发现了显著差异(<0.001)。此外,在记忆评估领域,与医生进行的评估相比,GPT-3.5在21项具体指标中的8项上表现出差异(<0.05)。此外,GPT-3.5在言语评估中与医生评估存在统计学上的显著偏差(=0.009)。在有中风病史的参与者中,与医生主导的评估相比,GPT-3.5仅在言语评估中表现出差异(=0.002)。值得注意的是,通过实施优化的评分方法和完善交互协议,这些差异得到了部分缓解。

结论

ChatGPT可以产生与传统方法相当的评估结果。尽管与医生评估存在差异,但评分算法和交互协议的完善提高了一致性。ChatGPT即使在中风等特定疾病人群中也表现良好,表明其具有通用性。GPT-4产生的结果更接近医生评分,表明有进一步改进的潜力。这些发现凸显了ChatGPT作为辅助工具的重要性,为医学领域的信息收集提供了新途径,并指导其持续发展和应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/7f04a14d59b0/10.1177_20552076241264639-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/8702c67d691c/10.1177_20552076241264639-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/bb360db144cb/10.1177_20552076241264639-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/71d5729e6358/10.1177_20552076241264639-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/bb319a69d0aa/10.1177_20552076241264639-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/4585db7f659f/10.1177_20552076241264639-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/6a6bf10412ba/10.1177_20552076241264639-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/c5791a26c451/10.1177_20552076241264639-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/7f04a14d59b0/10.1177_20552076241264639-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/8702c67d691c/10.1177_20552076241264639-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/bb360db144cb/10.1177_20552076241264639-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/71d5729e6358/10.1177_20552076241264639-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/bb319a69d0aa/10.1177_20552076241264639-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/4585db7f659f/10.1177_20552076241264639-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/6a6bf10412ba/10.1177_20552076241264639-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/c5791a26c451/10.1177_20552076241264639-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a273/11329975/7f04a14d59b0/10.1177_20552076241264639-fig8.jpg

相似文献

1
Evaluating cognitive performance: Traditional methods vs. ChatGPT.评估认知表现:传统方法与ChatGPT对比
Digit Health. 2024 Aug 16;10:20552076241264639. doi: 10.1177/20552076241264639. eCollection 2024 Jan-Dec.
2
Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI.ChatGPT 在中文体检、病历和教育方面的表现和探索:为医疗 AI 铺平道路。
Int J Med Inform. 2023 Sep;177:105173. doi: 10.1016/j.ijmedinf.2023.105173. Epub 2023 Aug 4.
3
ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology.与放射科医生在肌肉骨骼放射学中的诊断表现相比,基于文本与视觉信息的ChatGPT的诊断表现。
Eur Radiol. 2025 Jan;35(1):506-516. doi: 10.1007/s00330-024-10902-5. Epub 2024 Jul 12.
4
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
5
Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4.评估生成式预训练转换器(GPT)在临床决策中的应用:GPT-3.5 和 GPT-4 的对比分析。
J Med Internet Res. 2024 Jun 27;26:e54571. doi: 10.2196/54571.
6
Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.ChatGPT在日本国家医师资格考试医学问题上的准确性:评估研究
JMIR Form Res. 2023 Oct 13;7:e48023. doi: 10.2196/48023.
7
Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis.ChatGPT-3.5 和 GPT-4 在医学、药学、牙科和护理国家执照考试中的表现:系统评价和荟萃分析。
BMC Med Educ. 2024 Sep 16;24(1):1013. doi: 10.1186/s12909-024-05944-8.
8
Efficacy of ChatGPT in Cantonese Sentiment Analysis: Comparative Study.ChatGPT 在粤语情感分析中的有效性:对比研究。
J Med Internet Res. 2024 Jan 30;26:e51069. doi: 10.2196/51069.
9
Are Different Versions of ChatGPT's Ability Comparable to the Clinical Diagnosis Presented in Case Reports? A Descriptive Study.ChatGPT不同版本的能力与病例报告中的临床诊断具有可比性吗?一项描述性研究。
J Multidiscip Healthc. 2023 Dec 6;16:3825-3831. doi: 10.2147/JMDH.S441790. eCollection 2023.
10
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

引用本文的文献

1
Current Landscape and Future Directions Regarding Generative Large Language Models in Stroke Care: Scoping Review.中风护理中生成式大语言模型的当前现状与未来方向:范围综述
JMIR Med Inform. 2025 Aug 7;13:e76636. doi: 10.2196/76636.

本文引用的文献

1
ChatGPT in surgery: a revolutionary innovation?ChatGPT 在外科手术中的应用:一场革命性的创新?
Surg Today. 2024 Aug;54(8):964-971. doi: 10.1007/s00595-024-02800-6. Epub 2024 Feb 29.
2
Performance of ChatGPT in Israeli Hebrew Internal Medicine National Residency Exam.ChatGPT 在以色列希伯来语内科住院医师考试中的表现。
Isr Med Assoc J. 2024 Feb;26(2):86-88.
3
Assessing the Effectiveness of ChatGPT in Delivering Mental Health Support: A Qualitative Study.评估ChatGPT在提供心理健康支持方面的有效性:一项定性研究。
J Multidiscip Healthc. 2024 Jan 31;17:461-471. doi: 10.2147/JMDH.S447368. eCollection 2024.
4
Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study.揭示 ChatGPT 在视网膜血管疾病分类上的语言差异:一项横断面研究。
J Med Internet Res. 2024 Jan 22;26:e51926. doi: 10.2196/51926.
5
Text Dialogue Analysis for Primary Screening of Mild Cognitive Impairment: Development and Validation Study.文本对话分析用于轻度认知障碍的初步筛查:开发与验证研究。
J Med Internet Res. 2023 Dec 29;25:e51501. doi: 10.2196/51501.
6
Novel research and future prospects of artificial intelligence in cancer diagnosis and treatment.人工智能在癌症诊断和治疗中的新研究与未来展望。
J Hematol Oncol. 2023 Nov 27;16(1):114. doi: 10.1186/s13045-023-01514-5.
7
Leveraging Large Language Models for Decision Support in Personalized Oncology.利用大型语言模型为个性化肿瘤学提供决策支持。
JAMA Netw Open. 2023 Nov 1;6(11):e2343689. doi: 10.1001/jamanetworkopen.2023.43689.
8
ChatGPT's dance with neuropsychological data: A case study in Alzheimer's disease.ChatGPT 与神经心理学数据共舞:阿尔茨海默病案例研究。
Ageing Res Rev. 2023 Dec;92:102117. doi: 10.1016/j.arr.2023.102117. Epub 2023 Nov 4.
9
A social robot connected with chatGPT to improve cognitive functioning in ASD subjects.一种与ChatGPT相连的社交机器人,用于改善自闭症谱系障碍(ASD)患者的认知功能。
Front Psychol. 2023 Oct 5;14:1232177. doi: 10.3389/fpsyg.2023.1232177. eCollection 2023.
10
Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer.ChatGPT 和 GPT-4 在挖掘肺癌 CT 报告自由文本数据方面的潜力
Radiology. 2023 Sep;308(3):e231362. doi: 10.1148/radiol.231362.