• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Chat GPT作为神经评分计算器:大型语言模型在各种神经学检查评分量表上的性能分析。

Chat GPT as a Neuro-Score Calculator: Analysis of a Large Language Model's Performance on Various Neurological Exam Grading Scales.

作者信息

Chen Tse Chiang, Kaminski Emily, Koduri Laila, Singer Alyssa, Singer Jorie, Couldwell Mitch, Delashaw Johnny, Dumont Aaron, Wang Arthur

机构信息

Department of Neurology, Tulane University School of Medicine, New Orleans, Louisiana, USA.

Tulane University School of Medicine, New Orleans, Louisiana, USA.

出版信息

World Neurosurg. 2023 Nov;179:e342-e347. doi: 10.1016/j.wneu.2023.08.088. Epub 2023 Aug 26.

DOI:10.1016/j.wneu.2023.08.088
PMID:37634667
Abstract

BACKGROUND

ChatGPT is a large language model artificial intelligence chatbot that has been applied to different aspects of the medical field. Our study aims to assess the quality of chatGPT to evaluate patients based on their exams for different scores including Glasgow Coma Scale (GCS), intracranial hemorrhage score (ICH), and Hunt & Hess (H&H) classification.

METHODS

We created batches of patient test cases with detailed neurological exams, totaling 20 cases and created variants of increasing complex phrasing of the test cases. Using ChatGPT, we assessed repeatability and quantified the errors, including the average error rate (AER) and magnitude of errors (AME). We repeated this process for the H&H and the ICH score using base cases. Specific prompts were created for each calculator.

RESULTS

The GCS calculator on 10 base test cases had an AER/AME of 10%/0.150. The accuracy of ChatGPT decreased with increasing complexity; for example, in a variation where crucial information was missing, the AER was 45% for 20 cases. For H&H, AER/AME was 13%/0.13 and for ICH, AER/AME was 27.5%/0.325. Using a simple prompt resulted in a significantly higher error rate of 70%.

CONCLUSIONS

ChatGPT demonstrates ability in this proof-of-concept experiment in evaluating neuroexams using established assessment scales including GCS, ICH, and H&H. However, it has limitations in accuracy and may "hallucinate" with complex or vague descriptions. Nonetheless, ChatGPT, has promising potential in medicine.

摘要

背景

ChatGPT是一种大型语言模型人工智能聊天机器人,已应用于医学领域的不同方面。我们的研究旨在评估ChatGPT根据患者检查结果评估不同评分的质量,这些评分包括格拉斯哥昏迷量表(GCS)、颅内出血评分(ICH)和Hunt&Hess(H&H)分级。

方法

我们创建了一批包含详细神经学检查的患者测试病例,共20例,并创建了测试病例措辞越来越复杂的变体。使用ChatGPT,我们评估了可重复性并对错误进行了量化,包括平均错误率(AER)和错误幅度(AME)。我们使用基础病例对H&H和ICH评分重复了这个过程。为每个计算器创建了特定的提示。

结果

10个基础测试病例的GCS计算器的AER/AME为10%/0.150。ChatGPT的准确性随着复杂性的增加而降低;例如,在一个缺少关键信息的变体中,20例病例的AER为45%。对于H&H,AER/AME为13%/0.13,对于ICH,AER/AME为27.5%/0.325。使用简单提示会导致高达70%的显著更高错误率。

结论

在这个概念验证实验中,ChatGPT展示了使用包括GCS、ICH和H&H在内的既定评估量表评估神经学检查的能力。然而,它在准确性方面存在局限性,并且可能会对复杂或模糊的描述“产生幻觉”。尽管如此,ChatGPT在医学领域具有广阔的潜力。

相似文献

1
Chat GPT as a Neuro-Score Calculator: Analysis of a Large Language Model's Performance on Various Neurological Exam Grading Scales.Chat GPT作为神经评分计算器:大型语言模型在各种神经学检查评分量表上的性能分析。
World Neurosurg. 2023 Nov;179:e342-e347. doi: 10.1016/j.wneu.2023.08.088. Epub 2023 Aug 26.
2
Accuracy of a Commercial Large Language Model (ChatGPT) to Perform Disaster Triage of Simulated Patients Using the Simple Triage and Rapid Treatment (START) Protocol: Gage Repeatability and Reproducibility Study.商用大型语言模型(ChatGPT)运用简单分诊与快速治疗(START)协议对模拟患者进行灾难分诊的准确性:再现性和可重复性研究。
J Med Internet Res. 2024 Sep 30;26:e55648. doi: 10.2196/55648.
3
Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study.利用 ChatGPT-4 从医患对话的音频记录中创建结构化的医疗记录:比较研究。
J Med Internet Res. 2024 Apr 22;26:e54419. doi: 10.2196/54419.
4
Chat-GPT on brain tumors: An examination of Artificial Intelligence/Machine Learning's ability to provide diagnoses and treatment plans for example neuro-oncology cases.Chat-GPT 与脑肿瘤:人工智能/机器学习提供神经肿瘤学等案例诊断和治疗方案的能力评估。
Clin Neurol Neurosurg. 2024 Apr;239:108238. doi: 10.1016/j.clineuro.2024.108238. Epub 2024 Mar 9.
5
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响:来自台湾护理执照考试的见解。
Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.
6
Letter to the Editor Regarding "Chat GPT as a Neuro-score Calculator: Analysis of a Large Language Model's Performance on Various Neurological Exam Grading Scales".致编辑的信:关于“Chat GPT作为神经评分计算器:大型语言模型在各种神经学检查评分量表上的性能分析”
World Neurosurg. 2024 Jan;181:188. doi: 10.1016/j.wneu.2023.08.120.
7
FROM TEXT TO DIAGNOSE: CHATGPT'S EFFICACY IN MEDICAL DECISION-MAKING.从文本到诊断:ChatGPT 在医学决策中的功效。
Wiad Lek. 2023;76(11):2345-2350. doi: 10.36740/WLek202311101.
8
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
9
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
10
Performance of ChatGPT on free-response, clinical reasoning exams.ChatGPT在自由回答式临床推理考试中的表现。
medRxiv. 2023 Mar 29:2023.03.24.23287731. doi: 10.1101/2023.03.24.23287731.

引用本文的文献

1
Current Landscape and Future Directions Regarding Generative Large Language Models in Stroke Care: Scoping Review.中风护理中生成式大语言模型的当前现状与未来方向:范围综述
JMIR Med Inform. 2025 Aug 7;13:e76636. doi: 10.2196/76636.
2
Codeless Development of a Customized SMILE Nomogram Using a Large Language Model: A Practical Framework for Clinicians.使用大语言模型进行定制化全飞秒手术列线图的无代码开发:面向临床医生的实用框架
J Ophthalmol. 2025 Jul 15;2025:9930116. doi: 10.1155/joph/9930116. eCollection 2025.
3
Assessment of the Modified Rankin Scale in Electronic Health Records with a Fine-tuned Large Language Model.
使用微调的大语言模型评估电子健康记录中的改良Rankin量表。
medRxiv. 2025 May 2:2025.04.30.25326777. doi: 10.1101/2025.04.30.25326777.
4
Predicting explainable dementia types with LLM-aided feature engineering.利用大语言模型辅助特征工程预测可解释的痴呆类型。
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf156.
5
The Clinicians' Guide to Large Language Models: A General Perspective With a Focus on Hallucinations.临床医生的大语言模型指南:以幻觉为重点的总体视角
Interact J Med Res. 2025 Jan 28;14:e59823. doi: 10.2196/59823.
6
What's Going On With Me and How Can I Better Manage My Health? The Potential of GPT-4 to Transform Discharge Letters Into Patient-Centered Letters to Enhance Patient Safety: Prospective, Exploratory Study.我怎么了,如何更好地管理自己的健康?GPT-4将出院小结转化为以患者为中心的信件以提高患者安全的潜力:前瞻性探索性研究。
J Med Internet Res. 2025 Jan 21;27:e67143. doi: 10.2196/67143.
7
The Goldilocks Zone: Finding the right balance of user and institutional risk for suicide-related generative AI queries.适居带:为与自杀相关的生成式人工智能查询找到用户风险与机构风险的恰当平衡。
PLOS Digit Health. 2025 Jan 8;4(1):e0000711. doi: 10.1371/journal.pdig.0000711. eCollection 2025 Jan.
8
Analyzing evaluation methods for large language models in the medical field: a scoping review.分析医学领域大语言模型的评价方法:范围综述。
BMC Med Inform Decis Mak. 2024 Nov 29;24(1):366. doi: 10.1186/s12911-024-02709-7.
9
Large language models as a diagnostic support tool in neuropathology.大语言模型在神经病理学中的诊断支持工具。
J Pathol Clin Res. 2024 Nov;10(6):e70009. doi: 10.1002/2056-4538.70009.
10
Evaluating large language models for health-related text classification tasks with public social media data.利用公共社交媒体数据评估用于健康相关文本分类任务的大型语言模型。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2181-2189. doi: 10.1093/jamia/ocae210.