Suppr超能文献

ChatGPT-4在自身免疫性肝炎中是一种可靠的工具吗?

Is ChatGPT-4 a Reliable Tool in Autoimmune Hepatitis?

作者信息

Colapietro Francesca, Piovani Daniele, Pugliese Nicola, Aghemo Alessio, Ronca Vincenzo, Lleo Ana

机构信息

Department of Biomedical Sciences, Humanitas University, Milan, Italy.

IRCCS Humanitas Research Hospital, Department of Gastroenterology, Division of Internal Medicine and Hepatology, Milan, Italy.

出版信息

Am J Gastroenterol. 2025 Apr 1;120(4):914-919. doi: 10.14309/ajg.0000000000003179. Epub 2024 Oct 31.

Abstract

INTRODUCTION

Artificial intelligence-based chatbots offer a potential avenue for delivering personalized counseling to patients with autoimmune hepatitis. We assessed accuracy, completeness, comprehensiveness, and safety of Chat Generative Pretrained Transformer-4 responses to 12 inquiries out of a pool of 40 questions posed by 4 patients with autoimmune hepatitis.

METHODS

Questions were categorized into 3 areas: diagnosis (1-3), quality of life (4-8), and medical treatment (9-12). 11 key opinion leaders evaluated responses using a Likert scale with 6 points for accuracy, 5 points for safety, and 3 points for completeness and comprehensiveness.

RESULTS

Median scores for accuracy, completeness, comprehensiveness, and safety were 5 (4-6), 2 (2-2), and 3 (2-3), respectively; no domain exhibited superior evaluation. Postdiagnosis follow-up question was the trickiest with low accuracy and completeness, but safe and comprehensive features. Agreement among key opinion leaders (Fleiss Kappa statistics) was slight for the accuracy (0.05) but poor for the remaining features (-0.05, -0.06, and -0.02, respectively).

DISCUSSION

Chatbots show good comprehensibility, but lack reliability. Further studies are needed to integrate Chat Generative Pretrained Transformer within clinical practice.

摘要

引言

基于人工智能的聊天机器人为向自身免疫性肝炎患者提供个性化咨询提供了一条潜在途径。我们评估了Chat Generative Pretrained Transformer-4对4名自身免疫性肝炎患者提出的40个问题中的12个问题的回答的准确性、完整性、全面性和安全性。

方法

问题分为3个领域:诊断(1-3)、生活质量(4-8)和医疗治疗(9-12)。11位关键意见领袖使用李克特量表对回答进行评估,准确性为6分,安全性为5分,完整性和全面性为3分。

结果

准确性、完整性、全面性和安全性的中位数分数分别为5(4-6)、2(2-2)和3(2-3);没有一个领域表现出卓越的评估。诊断后随访问题最难,准确性和完整性较低,但具有安全性和全面性特征。关键意见领袖之间的一致性(Fleiss Kappa统计)在准确性方面为轻微一致(0.05),但在其余特征方面为较差一致(分别为-0.05、-0.06和-0.02)。

讨论

聊天机器人表现出良好的可理解性,但缺乏可靠性。需要进一步研究将Chat Generative Pretrained Transformer整合到临床实践中。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验