关于麻醉和手术的人工智能生成回复的质量和数量评估：使用ChatGPT 3.5和4.0

Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0.

作者信息

Choi Jisun, Oh Ah Ran, Park Jungchan, Kang Ryung A, Yoo Seung Yeon, Lee Dong Jae, Yang Kwangmo

机构信息

Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.

Center for Health Promotion, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.

出版信息

Front Med (Lausanne). 2024 Jul 11;11:1400153. doi: 10.3389/fmed.2024.1400153. eCollection 2024.

DOI:10.3389/fmed.2024.1400153

PMID:39055693

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11269144/

Abstract

INTRODUCTION

The large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures.

METHODS

Two anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample -test compared ChatGPT 3.5 and 4.0.

RESULTS

Regarding quality, "appropriate" was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed "insufficient" in 59% of cases for 3.5, and "adequate" in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were - 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas.

CONCLUSION

ChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.

摘要

引言

大规模人工智能（AI）语言模型聊天机器人Chat生成预训练变换器（ChatGPT）以能够快速高效地提供数据而闻名。本研究旨在评估ChatGPT关于麻醉程序的医学回答。

方法

两位麻醉科医生作者选择了30个代表患者可能对手术和麻醉提出的问题。这些问题被输入到两个英文版的ChatGPT中。然后，共有31位麻醉科医生使用5点李克特量表对每个回答的质量、数量和总体评估进行评价。描述性统计总结了得分，并使用配对样本t检验比较了ChatGPT 3.5和4.0。

结果

在质量方面，“合适”是ChatGPT 3.5和4.0最常见的评级（分别为40%和48%）。在数量方面，3.5的回答在59%的情况下被认为“不足”，而4.0的回答在69%的情况下被认为“足够”。在总体评估中，3.5最常见的分数是3分（36%），而4.0最常见的分数是4分（42%）。平均质量得分分别为3.40和3.73，平均数量得分分别为-0.3（介于不足和足够之间）和0.03（介于足够和过多之间）。3.5的平均总体得分为3.21，4.0的平均总体得分为3.67。4.0的回答在三个方面显示出统计学上的显著改善。

结论

ChatGPT生成的回答大多介于合适到略有不足之间，提供了总体平均信息量。4.0版本优于3.5版本，有必要进一步研究以探讨人工智能聊天机器人在协助患者获取医学信息方面的潜在效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5363/11269144/a16773126f72/fmed-11-1400153-g001.jpg

相似文献

Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0.关于麻醉和手术的人工智能生成回复的质量和数量评估：使用ChatGPT 3.5和4.0

Front Med (Lausanne). 2024 Jul 11;11:1400153. doi: 10.3389/fmed.2024.1400153. eCollection 2024.

Putting ChatGPT's Medical Advice to the (Turing) Test: Survey Study.对ChatGPT的医学建议进行（图灵）测试：调查研究。

JMIR Med Educ. 2023 Jul 10;9:e46939. doi: 10.2196/46939.

The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.人工智能聊天机器人大型语言模型在解决骨骼生物学和骨骼健康问题方面的表现。

J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007.

Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.利用人工智能在减重手术中的应用：ChatGPT-4、Bing 和 Bard 在生成临床医生水平的减重手术建议方面的比较分析。

Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24.

Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性：ChatGPT与谷歌巴德人工智能的比较分析

Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.

Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard.三款聊天机器人的听力学知识比较：ChatGPT、必应聊天和巴德

Audiol Neurootol. 2024;29(6):457-463. doi: 10.1159/000538983. Epub 2024 May 6.

Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。

Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。

Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.

Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures.人工智能聊天机器人对神经外科手术患者问题回答的实用性和准确性

Neurosurgery. 2024 Feb 14. doi: 10.1227/neu.0000000000002856.

Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis.中文自闭症患者网络问诊中，医生与大型语言模型聊天机器人回复的对比分析：横断面研究。

J Med Internet Res. 2024 Apr 30;26:e54706. doi: 10.2196/54706.

引用本文的文献

Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation.用于增强眼科决策的多模态推理智能体：一项初步的真实世界临床验证

Front Cell Dev Biol. 2025 Jul 23;13:1642539. doi: 10.3389/fcell.2025.1642539. eCollection 2025.

Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study.医学生、ChatGPT-3.5和ChatGPT-4.0在回答巴西国家医学考试问题中的表现比较：横断面问卷调查研究

JMIR AI. 2025 May 8;4:e66552. doi: 10.2196/66552.

The applications of ChatGPT and other large language models in anesthesiology and critical care: a systematic review.ChatGPT及其他大语言模型在麻醉学与重症监护中的应用：一项系统综述

Can J Anaesth. 2025 Jun 16. doi: 10.1007/s12630-025-02973-9.

本文引用的文献

Addressing obesity and homelessness via ChatGPT.通过ChatGPT解决肥胖和无家可归问题。

Clin Med (Lond). 2023 Nov;23(6):647. doi: 10.7861/clinmed.Let.23.6.3.

The scholarly footprint of ChatGPT: a bibliometric analysis of the early outbreak phase.ChatGPT的学术影响力：早期爆发阶段的文献计量分析

Front Artif Intell. 2024 Jan 5;6:1270749. doi: 10.3389/frai.2023.1270749. eCollection 2023.

Correspondence to Revolutionizing Bariatric Surgery: the AI Assistant You Didn't Know You Needed.

Obes Surg. 2024 Jan;34(1):268-269. doi: 10.1007/s11695-023-06968-7. Epub 2023 Dec 4.

ChatGPT promotes healthcare: current applications and potential challenges.ChatGPT推动医疗保健：当前应用及潜在挑战。

Int J Surg. 2024 Jan 1;110(1):606-608. doi: 10.1097/JS9.0000000000000802.

Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review.探索ChatGPT在患者护理（诊断与治疗）及医学研究中的作用：一项系统综述。

Health Promot Perspect. 2023 Sep 11;13(3):183-191. doi: 10.34172/hpp.2023.22. eCollection 2023.

A descriptive study based on the comparison of ChatGPT and evidence-based neurosurgeons.一项基于ChatGPT与循证神经外科医生比较的描述性研究。

iScience. 2023 Aug 9;26(9):107590. doi: 10.1016/j.isci.2023.107590. eCollection 2023 Sep 15.

Ethical Considerations of Using ChatGPT in Health Care.使用 ChatGPT 在医疗保健中的伦理考虑。

J Med Internet Res. 2023 Aug 11;25:e48009. doi: 10.2196/48009.

Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery.评估语言模型 ChatGPT 对肥胖症手术相关问题回答的准确性。

Obes Surg. 2023 Jun;33(6):1790-1796. doi: 10.1007/s11695-023-06603-5. Epub 2023 Apr 27.

Exploring the Potential of Artificial Intelligence in Surgery: Insights from a Conversation with ChatGPT.探索人工智能在手术中的潜力：与ChatGPT对话的见解

Ann Surg Oncol. 2023 Jul;30(7):3875-3878. doi: 10.1245/s10434-023-13347-0. Epub 2023 Apr 5.

The use of ChatGPT and other large language models in surgical science.ChatGPT及其他大语言模型在外科科学中的应用。

BJS Open. 2023 Mar 7;7(2). doi: 10.1093/bjsopen/zrad032.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

关于麻醉和手术的人工智能生成回复的质量和数量评估：使用ChatGPT 3.5和4.0

Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSION

引言

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献