• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用人工智能和聊天生成预训练变换器回答神经麻醉学临床场景相关问题。

Utilizing Artificial Intelligence and Chat Generative Pretrained Transformer to Answer Questions About Clinical Scenarios in Neuroanesthesiology.

作者信息

Blacker Samuel N, Kang Mia, Chakraborty Indranil, Chowdhury Tumul, Williams James, Lewis Carol, Zimmer Michael, Wilson Brad, Lele Abhijit V

机构信息

Department of Anesthesiology, University of North Carolina at Chapel Hill.

Department of Anesthesiology, University of Arkansas.

出版信息

J Neurosurg Anesthesiol. 2023 Dec 19. doi: 10.1097/ANA.0000000000000949.

DOI:10.1097/ANA.0000000000000949
PMID:38124357
Abstract

OBJECTIVE

We tested the ability of chat generative pretrained transformer (ChatGPT), an artificial intelligence chatbot, to answer questions relevant to scenarios covered in 3 clinical guidelines, published by the Society for Neuroscience in Anesthesiology and Critical Care (SNACC), which has published management guidelines: endovascular treatment of stroke, perioperative stroke (Stroke), and care of patients undergoing complex spine surgery (Spine).

METHODS

Four neuroanesthesiologists independently assessed whether ChatGPT could apply 52 high-quality recommendations (HQRs) included in the 3 SNACC guidelines. HQRs were deemed present in the ChatGPT responses if noted by at least 3 of the 4 reviewers. Reviewers also identified incorrect references, potentially harmful recommendations, and whether ChatGPT cited the SNACC guidelines.

RESULTS

The overall reviewer agreement for the presence of HQRs in the ChatGPT answers ranged from 0% to 100%. Only 4 of 52 (8%) HQRs were deemed present by at least 3 of the 4 reviewers after 5 generic questions, and 23 (44%) HQRs were deemed present after at least 1 additional targeted question. Potentially harmful recommendations were identified for each of the 3 clinical scenarios and ChatGPT failed to cite the SNACC guidelines.

CONCLUSIONS

The ChatGPT answers were open to human interpretation regarding whether the responses included the HQRs. Though targeted questions resulted in the inclusion of more HQRs than generic questions, fewer than 50% of HQRs were noted even after targeted questions. This suggests that ChatGPT should not currently be considered a reliable source of information for clinical decision-making. Future iterations of ChatGPT may refine algorithms to improve its reliability as a source of clinical information.

摘要

目的

我们测试了人工智能聊天机器人聊天生成预训练变换器(ChatGPT)回答与神经麻醉和重症监护学会(SNACC)发布的3项临床指南所涵盖场景相关问题的能力,该学会已发布管理指南:中风的血管内治疗、围手术期中风(中风)以及复杂脊柱手术患者的护理(脊柱)。

方法

四名神经麻醉医师独立评估ChatGPT是否能够应用SNACC的3项指南中包含的52条高质量推荐(HQR)。如果4名审阅者中至少有3人指出,则认为ChatGPT的回答中存在HQR。审阅者还识别了错误参考文献、潜在有害推荐以及ChatGPT是否引用了SNACC指南。

结果

审阅者对ChatGPT回答中HQR存在情况的总体一致性范围为0%至100%。在提出5个一般性问题后,52条(8%)HQR中只有4条被4名审阅者中的至少3人认为存在,在至少再提出1个针对性问题后,23条(44%)HQR被认为存在。针对3种临床场景均识别出了潜在有害推荐,且ChatGPT未引用SNACC指南。

结论

ChatGPT的回答对于其是否包含HQR容易存在人为解读差异。尽管针对性问题比一般性问题纳入了更多的HQR,但即使在提出针对性问题后,也只有不到50%的HQR被指出。这表明目前ChatGPT不应被视为临床决策的可靠信息来源。ChatGPT的未来迭代可能会优化算法,以提高其作为临床信息来源的可靠性。

相似文献

1
Utilizing Artificial Intelligence and Chat Generative Pretrained Transformer to Answer Questions About Clinical Scenarios in Neuroanesthesiology.利用人工智能和聊天生成预训练变换器回答神经麻醉学临床场景相关问题。
J Neurosurg Anesthesiol. 2023 Dec 19. doi: 10.1097/ANA.0000000000000949.
2
Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis.聊天生成预训练转换器(ChatGPT)和巴德:人工智能尚未为髋和膝关节骨关节炎提供临床支持的答案。
J Arthroplasty. 2024 May;39(5):1184-1190. doi: 10.1016/j.arth.2024.01.029. Epub 2024 Jan 17.
3
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能:比较混合方法研究。
J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.
4
Conformity of ChatGPT recommendations with the AUA/SUFU guideline on postprostatectomy urinary incontinence.与 AUA/SUFU 后前列腺切除术后尿失禁指南的一致性。
Neurourol Urodyn. 2024 Apr;43(4):935-941. doi: 10.1002/nau.25442. Epub 2024 Mar 7.
5
The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.人工智能聊天机器人大型语言模型在解决骨骼生物学和骨骼健康问题方面的表现。
J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007.
6
A comparative analysis of GPT-3.5 and GPT-4.0 on a multiple-choice ophthalmology question bank: A study on artificial intelligence developments.基于多项选择题眼科题库对GPT-3.5和GPT-4.0的比较分析:一项关于人工智能发展的研究。
Rom J Ophthalmol. 2024 Oct-Dec;68(4):367-371. doi: 10.22336/rjo.2024.67.
7
The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions.人工智能 ChatGPT 在肿瘤学检查问题中的准确性。
J Am Coll Radiol. 2024 Nov;21(11):1800-1804. doi: 10.1016/j.jacr.2024.07.011. Epub 2024 Aug 2.
8
ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis.ChatGPT 与 NASS 退行性脊柱滑脱临床指南比较分析。
Eur Spine J. 2024 Nov;33(11):4182-4203. doi: 10.1007/s00586-024-08198-6. Epub 2024 Mar 15.
9
Appropriateness and Reliability of an Online Artificial Intelligence Platform's Responses to Common Questions Regarding Distal Radius Fractures.在线人工智能平台对桡骨远端骨折常见问题的回答的适宜性和可靠性。
J Hand Surg Am. 2024 Feb;49(2):91-98. doi: 10.1016/j.jhsa.2023.10.019. Epub 2023 Dec 8.
10
Assessing the Performance of Chat Generative Pretrained Transformer (ChatGPT) in Answering Andrology-Related Questions.评估聊天生成预训练变换器(ChatGPT)回答男科相关问题的性能。
Urol Res Pract. 2023 Nov;49(6):365-369. doi: 10.5152/tud.2023.23171.

引用本文的文献

1
Current Landscape and Future Directions Regarding Generative Large Language Models in Stroke Care: Scoping Review.中风护理中生成式大语言模型的当前现状与未来方向:范围综述
JMIR Med Inform. 2025 Aug 7;13:e76636. doi: 10.2196/76636.
2
The applications of ChatGPT and other large language models in anesthesiology and critical care: a systematic review.ChatGPT及其他大语言模型在麻醉学与重症监护中的应用:一项系统综述
Can J Anaesth. 2025 Jun 16. doi: 10.1007/s12630-025-02973-9.
3
Evaluating a large language model's ability to answer clinicians' requests for evidence summaries.
评估大型语言模型回答临床医生对证据总结请求的能力。
J Med Libr Assoc. 2025 Jan 14;113(1):65-77. doi: 10.5195/jmla.2025.1985.
4
Leveraging Guideline-Based Clinical Decision Support Systems with Large Language Models: A Case Study with Breast Cancer.利用基于指南的临床决策支持系统与大语言模型:乳腺癌案例研究
Methods Inf Med. 2024 Sep;63(3-04):85-96. doi: 10.1055/a-2528-4299. Epub 2025 Jan 29.