人工智能聊天机器人对神经外科手术患者问题回答的实用性和准确性

Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures.

作者信息

Gajjar Avi A, Kumar Rohit Prem, Paliwoda Ethan D, Kuo Cathleen C, Adida Samuel, Legarreta Andrew D, Deng Hansen, Anand Sharath Kumar, Hamilton D Kojo, Buell Thomas J, Agarwal Nitin, Gerszten Peter C, Hudson Joseph S

机构信息

Department of Neurological Surgery, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA.

Department of Neurological Surgery, Albany Medical College, Albany, New York, USA.

出版信息

Neurosurgery. 2024 Feb 14. doi: 10.1227/neu.0000000000002856.

DOI:10.1227/neu.0000000000002856

PMID:38353558

Abstract

BACKGROUND AND OBJECTIVES

The Internet has become a primary source of health information, leading patients to seek answers online before consulting health care providers. This study aims to evaluate the implementation of Chat Generative Pre-Trained Transformer (ChatGPT) in neurosurgery by assessing the accuracy and helpfulness of artificial intelligence (AI)-generated responses to common postsurgical questions.

METHODS

A list of 60 commonly asked questions regarding neurosurgical procedures was developed. ChatGPT-3.0, ChatGPT-3.5, and ChatGPT-4.0 responses to these questions were recorded and graded by numerous practitioners for accuracy and helpfulness. The understandability and actionability of the answers were assessed using the Patient Education Materials Assessment Tool. Readability analysis was conducted using established scales.

RESULTS

A total of 1080 responses were evaluated, equally divided among ChatGPT-3.0, 3.5, and 4.0, each contributing 360 responses. The mean helpfulness score across the 3 subsections was 3.511 ± 0.647 while the accuracy score was 4.165 ± 0.567. The Patient Education Materials Assessment Tool analysis revealed that the AI-generated responses had higher actionability scores than understandability. This indicates that the answers provided practical guidance and recommendations that patients could apply effectively. On the other hand, the mean Flesch Reading Ease score was 33.5, suggesting that the readability level of the responses was relatively complex. The Raygor Readability Estimate scores ranged within the graduate level, with an average score of the 15th grade.

CONCLUSION

The artificial intelligence chatbot's responses, although factually accurate, were not rated highly beneficial, with only marginal differences in perceived helpfulness and accuracy between ChatGPT-3.0 and ChatGPT-3.5 versions. Despite this, the responses from ChatGPT-4.0 showed a notable improvement in understandability, indicating enhanced readability over earlier versions.

摘要

背景与目的

互联网已成为健康信息的主要来源，导致患者在咨询医疗服务提供者之前先在网上寻求答案。本研究旨在通过评估人工智能（AI）对常见术后问题的回答的准确性和实用性，来评估聊天生成预训练变换器（ChatGPT）在神经外科中的应用情况。

方法

制定了一份包含60个关于神经外科手术常见问题的列表。记录了ChatGPT-3.0、ChatGPT-3.5和ChatGPT-4.0对这些问题的回答，并由众多从业者对其准确性和实用性进行评分。使用患者教育材料评估工具评估答案的可理解性和可操作性。使用既定量表进行可读性分析。

结果

共评估了1080个回答，ChatGPT-3.0、3.5和4.0各占三分之一，均为360个回答。三个子部分的平均实用性得分为3.511±0.647，而准确性得分为4.165±0.567。患者教育材料评估工具分析显示，AI生成的回答的可操作性得分高于可理解性得分。这表明答案提供了患者可以有效应用的实用指导和建议。另一方面，平均弗莱施易读性得分为33.5，表明回答的可读性水平相对复杂。雷戈尔可读性估计得分在研究生水平范围内，平均得分为15年级。

结论

人工智能聊天机器人的回答虽然在事实上是准确的，但被评为益处不大，ChatGPT-3.0和ChatGPT-3.5版本在感知实用性和准确性方面只有微小差异。尽管如此，ChatGPT-4.0的回答在可理解性方面有显著改善，表明比早期版本的可读性有所提高。

相似文献

Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures.人工智能聊天机器人对神经外科手术患者问题回答的实用性和准确性

Neurosurgery. 2024 Feb 14. doi: 10.1227/neu.0000000000002856.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性：公众需谨慎。

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效：多指标评估。

J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.

Comparing patient education tools for chronic pain medications: Artificial intelligence chatbot versus traditional patient information leaflets.比较慢性疼痛药物的患者教育工具：人工智能聊天机器人与传统患者信息手册。

Indian J Anaesth. 2024 Jul;68(7):631-636. doi: 10.4103/ija.ija_204_24. Epub 2024 Jun 7.

Both Patients and Plastic Surgeons Prefer Artificial Intelligence-Generated Microsurgical Information.患者和整形外科医生都更喜欢人工智能生成的显微手术信息。

J Reconstr Microsurg. 2024 Nov;40(9):657-664. doi: 10.1055/a-2273-4163. Epub 2024 Feb 21.

Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。

Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.

Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.谷歌医生与ChatGPT医生：通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性，探索人工智能在眼科领域的应用。

Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.

BPPV Information on Google Versus AI (ChatGPT).谷歌与人工智能（ChatGPT）上的良性阵发性位置性眩晕信息

Otolaryngol Head Neck Surg. 2024 Jun;170(6):1504-1511. doi: 10.1002/ohn.506. Epub 2023 Aug 25.

Evaluating the accuracy and readability of ChatGPT in providing parental guidance for adenoidectomy, tonsillectomy, and ventilation tube insertion surgery.评估 ChatGPT 在提供腺样体切除术、扁桃体切除术和通气管插入手术的家长指导方面的准确性和可读性。

Int J Pediatr Otorhinolaryngol. 2024 Jun;181:111998. doi: 10.1016/j.ijporl.2024.111998. Epub 2024 May 31.

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。

Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.

引用本文的文献

AI-Driven Information for Relatives of Patients with Malignant Middle Cerebral Artery Infarction: A Preliminary Validation Study Using GPT-4o.人工智能驱动的大脑中动脉恶性梗死患者亲属信息：使用GPT-4o的初步验证研究

Brain Sci. 2025 Apr 11;15(4):391. doi: 10.3390/brainsci15040391.

Evaluating the Quality and Readability of Generative Artificial Intelligence (AI) Chatbot Responses in the Management of Achilles Tendon Rupture.评估生成式人工智能（AI）聊天机器人在跟腱断裂管理中的回复质量和可读性。

Cureus. 2025 Jan 31;17(1):e78313. doi: 10.7759/cureus.78313. eCollection 2025 Jan.

Artificial Intelligence for Patient Safety and Surgical Education in Neurosurgery.用于神经外科患者安全与手术教育的人工智能

JMA J. 2025 Jan 15;8(1):76-85. doi: 10.31662/jmaj.2024-0141. Epub 2024 Aug 30.

Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians.泌尿外科中医生与人工智能生成的信息对比：患者和医生对准确性、完整性及偏好的评估

World J Urol. 2024 Dec 27;43(1):48. doi: 10.1007/s00345-024-05399-y.

AI in Dental Radiology-Improving the Efficiency of Reporting With ChatGPT: Comparative Study.牙科放射学中的人工智能——利用ChatGPT提高报告效率：比较研究

J Med Internet Res. 2024 Dec 23;26:e60684. doi: 10.2196/60684.

Large language models in patient education: a scoping review of applications in medicine.用于患者教育的大语言模型：医学应用的范围综述

Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.

ChatGPT Responses to Frequently Asked Questions on Ménière's Disease: A Comparison to Clinical Practice Guideline Answers.ChatGPT对梅尼埃病常见问题的回答：与临床实践指南答案的比较

OTO Open. 2024 Jul 5;8(3):e163. doi: 10.1002/oto2.163. eCollection 2024 Jul-Sep.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能聊天机器人对神经外科手术患者问题回答的实用性和准确性

Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures.

作者信息

机构信息

出版信息

BACKGROUND AND OBJECTIVES

METHODS

RESULTS

CONCLUSION

背景与目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献