Suppr超能文献

人工智能聊天机器人对神经外科手术患者问题回答的实用性和准确性

Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures.

作者信息

Gajjar Avi A, Kumar Rohit Prem, Paliwoda Ethan D, Kuo Cathleen C, Adida Samuel, Legarreta Andrew D, Deng Hansen, Anand Sharath Kumar, Hamilton D Kojo, Buell Thomas J, Agarwal Nitin, Gerszten Peter C, Hudson Joseph S

机构信息

Department of Neurological Surgery, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA.

Department of Neurological Surgery, Albany Medical College, Albany, New York, USA.

出版信息

Neurosurgery. 2024 Feb 14. doi: 10.1227/neu.0000000000002856.

Abstract

BACKGROUND AND OBJECTIVES

The Internet has become a primary source of health information, leading patients to seek answers online before consulting health care providers. This study aims to evaluate the implementation of Chat Generative Pre-Trained Transformer (ChatGPT) in neurosurgery by assessing the accuracy and helpfulness of artificial intelligence (AI)-generated responses to common postsurgical questions.

METHODS

A list of 60 commonly asked questions regarding neurosurgical procedures was developed. ChatGPT-3.0, ChatGPT-3.5, and ChatGPT-4.0 responses to these questions were recorded and graded by numerous practitioners for accuracy and helpfulness. The understandability and actionability of the answers were assessed using the Patient Education Materials Assessment Tool. Readability analysis was conducted using established scales.

RESULTS

A total of 1080 responses were evaluated, equally divided among ChatGPT-3.0, 3.5, and 4.0, each contributing 360 responses. The mean helpfulness score across the 3 subsections was 3.511 ± 0.647 while the accuracy score was 4.165 ± 0.567. The Patient Education Materials Assessment Tool analysis revealed that the AI-generated responses had higher actionability scores than understandability. This indicates that the answers provided practical guidance and recommendations that patients could apply effectively. On the other hand, the mean Flesch Reading Ease score was 33.5, suggesting that the readability level of the responses was relatively complex. The Raygor Readability Estimate scores ranged within the graduate level, with an average score of the 15th grade.

CONCLUSION

The artificial intelligence chatbot's responses, although factually accurate, were not rated highly beneficial, with only marginal differences in perceived helpfulness and accuracy between ChatGPT-3.0 and ChatGPT-3.5 versions. Despite this, the responses from ChatGPT-4.0 showed a notable improvement in understandability, indicating enhanced readability over earlier versions.

摘要

背景与目的

互联网已成为健康信息的主要来源,导致患者在咨询医疗服务提供者之前先在网上寻求答案。本研究旨在通过评估人工智能(AI)对常见术后问题的回答的准确性和实用性,来评估聊天生成预训练变换器(ChatGPT)在神经外科中的应用情况。

方法

制定了一份包含60个关于神经外科手术常见问题的列表。记录了ChatGPT-3.0、ChatGPT-3.5和ChatGPT-4.0对这些问题的回答,并由众多从业者对其准确性和实用性进行评分。使用患者教育材料评估工具评估答案的可理解性和可操作性。使用既定量表进行可读性分析。

结果

共评估了1080个回答,ChatGPT-3.0、3.5和4.0各占三分之一,均为360个回答。三个子部分的平均实用性得分为3.511±0.647,而准确性得分为4.165±0.567。患者教育材料评估工具分析显示,AI生成的回答的可操作性得分高于可理解性得分。这表明答案提供了患者可以有效应用的实用指导和建议。另一方面,平均弗莱施易读性得分为33.5,表明回答的可读性水平相对复杂。雷戈尔可读性估计得分在研究生水平范围内,平均得分为15年级。

结论

人工智能聊天机器人的回答虽然在事实上是准确的,但被评为益处不大,ChatGPT-3.0和ChatGPT-3.5版本在感知实用性和准确性方面只有微小差异。尽管如此,ChatGPT-4.0的回答在可理解性方面有显著改善,表明比早期版本的可读性有所提高。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验