• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

四种聊天机器人在自身免疫性肝病中的评估:一项比较分析。

Evaluation of four chatbots in autoimmune liver disease: A comparative analysis.

作者信息

Daza Jimmy, Bezerra Lucas Soares, Santamaría Laura, Rueda-Esteban Roberto, Bantel Heike, Girala Marcos, Ebert Matthias, Van Bömmel Florian, Geier Andreas, Aldana Andres Gomez, Yau Kevin, Alvares-da-Silva Mario, Peck-Radosavljevic Markus, Ridruejo Ezequiel, Weinmann Arndt, Teufel Andreas

机构信息

Division of Hepatology, Division of Clinical Bioinformatics, Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.

Universidad de Los Andes School of Medicine, Bogotá, Colombia.

出版信息

Ann Hepatol. 2024 Aug 13;30(1):101537. doi: 10.1016/j.aohep.2024.101537.

DOI:
10.1016/j.aohep.2024.101537
PMID:39147133
Abstract

INTRODUCTION AND OBJECTIVES

Autoimmune liver diseases (AILDs) are rare and require precise evaluation, which is often challenging for medical providers. Chatbots are innovative solutions to assist healthcare professionals in clinical management. In our study, ten liver specialists systematically evaluated four chatbots to determine their utility as clinical decision support tools in the field of AILDs.

MATERIALS AND METHODS

We constructed a 56-question questionnaire focusing on AILD evaluation, diagnosis, and management of Autoimmune Hepatitis (AIH), Primary Biliary Cholangitis (PBC), and Primary Sclerosing Cholangitis (PSC). Four chatbots -ChatGPT 3.5, Claude, Microsoft Copilot, and Google Bard- were presented with the questions in their free tiers in December 2023. Responses underwent critical evaluation by ten liver specialists using a standardized 1 to 10 Likert scale. The analysis included mean scores, the number of highest-rated replies, and the identification of common shortcomings in chatbots performance.

RESULTS

Among the assessed chatbots, specialists rated Claude highest with a mean score of 7.37 (SD = 1.91), followed by ChatGPT (7.17, SD = 1.89), Microsoft Copilot (6.63, SD = 2.10), and Google Bard (6.52, SD = 2.27). Claude also excelled with 27 best-rated replies, outperforming ChatGPT (20), while Microsoft Copilot and Google Bard lagged with only 6 and 9, respectively. Common deficiencies included listing details over specific advice, limited dosing options, inaccuracies for pregnant patients, insufficient recent data, over-reliance on CT and MRI imaging, and inadequate discussion regarding off-label use and fibrates in PBC treatment. Notably, internet access for Microsoft Copilot and Google Bard did not enhance precision compared to pre-trained models.

CONCLUSIONS

Chatbots hold promise in AILD support, but our study underscores key areas for improvement. Refinement is needed in providing specific advice, accuracy, and focused up-to-date information. Addressing these shortcomings is essential for enhancing the utility of chatbots in AILD management, guiding future development, and ensuring their effectiveness as clinical decision-support tools.

摘要

引言与目的

自身免疫性肝病(AILD)较为罕见,需要进行精确评估,这对医疗服务提供者来说往往具有挑战性。聊天机器人是协助医疗保健专业人员进行临床管理的创新解决方案。在我们的研究中,十位肝脏专家系统地评估了四个聊天机器人,以确定它们在AILD领域作为临床决策支持工具的效用。

材料与方法

我们构建了一份包含56个问题的问卷,重点关注自身免疫性肝炎(AIH)、原发性胆汁性胆管炎(PBC)和原发性硬化性胆管炎(PSC)的评估、诊断和管理。2023年12月,向四个聊天机器人——ChatGPT 3.5、Claude、Microsoft Copilot和Google Bard——以其免费层级呈现这些问题。十位肝脏专家使用标准化的1至10李克特量表对回复进行严格评估。分析包括平均得分、最高评分回复的数量以及聊天机器人性能中常见缺点的识别。

结果

在评估的聊天机器人中,专家对Claude的评分最高,平均得分为7.37(标准差 = 1.91),其次是ChatGPT(7.17,标准差 = 1.89)、Microsoft Copilot(6.63,标准差 = 2.10)和Google Bard(6.52,标准差 = 2.27)。Claude在最佳评分回复方面也表现出色,有27个,超过了ChatGPT(20个),而Microsoft Copilot和Google Bard则分别只有6个和9个,表现滞后。常见不足包括列出细节多于提供具体建议、给药选项有限、对孕妇的信息不准确、近期数据不足、过度依赖CT和MRI成像以及在PBC治疗中关于标签外使用和贝特类药物的讨论不足。值得注意的是,与预训练模型相比,Microsoft Copilot和Google Bard的联网功能并未提高准确性。

结论

聊天机器人在AILD支持方面具有潜力,但我们的研究强调了需要改进的关键领域。在提供具体建议、准确性和聚焦的最新信息方面需要改进。解决这些缺点对于提高聊天机器人在AILD管理中的效用、指导未来发展以及确保其作为临床决策支持工具的有效性至关重要。

相似文献

1
Evaluation of four chatbots in autoimmune liver disease: A comparative analysis.四种聊天机器人在自身免疫性肝病中的评估:一项比较分析。
Ann Hepatol. 2024 Aug 13;30(1):101537. doi: 10.1016/j.aohep.2024.101537.
2
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
3
A Comparison of Prostate Cancer Screening Information Quality on Standard and Advanced Versions of ChatGPT, Google Gemini, and Microsoft Copilot: A Cross-Sectional Study.ChatGPT标准版本与高级版本、谷歌Gemini和微软Copilot上前列腺癌筛查信息质量的比较:一项横断面研究。
Am J Health Promot. 2025 Jun;39(5):766-776. doi: 10.1177/08901171251316371. Epub 2025 Jan 24.
4
Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard.三款聊天机器人的听力学知识比较:ChatGPT、必应聊天和巴德
Audiol Neurootol. 2024;29(6):457-463. doi: 10.1159/000538983. Epub 2024 May 6.
5
Evaluation of autoimmune liver disease natural history in patients referred to Middle East Liver Diseases (MELD) center.评估转诊至中东肝病中心(MELD)的自身免疫性肝病患者的自然病史。
BMC Gastroenterol. 2024 Jan 4;24(1):17. doi: 10.1186/s12876-023-03105-7.
6
Prognostic models and autoimmune liver diseases.预后模型与自身免疫性肝病。
Best Pract Res Clin Gastroenterol. 2023 Dec;67:101878. doi: 10.1016/j.bpg.2023.101878. Epub 2023 Dec 1.
7
Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.ChatGPT-4、微软 Copilot 和谷歌 Gemini 在意大利医疗科学学位入学考试中的比较准确性:一项横断面研究。
BMC Med Educ. 2024 Jun 26;24(1):694. doi: 10.1186/s12909-024-05630-9.
8
Diagnosis and management of overlap syndromes.重叠综合征的诊断与管理
Clin Liver Dis. 2015 Feb;19(1):81-97. doi: 10.1016/j.cld.2014.09.005. Epub 2014 Nov 21.
9
[Autoimmune liver diseases and their overlap syndromes].[自身免疫性肝病及其重叠综合征]
Praxis (Bern 1994). 2006 Sep 6;95(36):1363-81. doi: 10.1024/1661-8157.95.36.1363.
10
Claude, ChatGPT, Copilot, and Gemini performance versus students in different topics of neuroscience.克劳德、ChatGPT、Copilot和Gemini在神经科学不同主题上与学生的表现对比。
Adv Physiol Educ. 2025 Jun 1;49(2):430-437. doi: 10.1152/advan.00093.2024. Epub 2025 Jan 17.

引用本文的文献

1
The Diagnostic Performance of Large Language Models and Oral Medicine Consultants for Identifying Oral Lesions in Text-Based Clinical Scenarios: Prospective Comparative Study.大语言模型与口腔医学顾问在基于文本的临床场景中识别口腔病变的诊断性能:前瞻性比较研究
JMIR AI. 2025 Apr 24;4:e70566. doi: 10.2196/70566.
2
Revolutionizing MASLD: How Artificial Intelligence Is Shaping the Future of Liver Care.重塑代谢相关脂肪性肝病:人工智能如何塑造肝脏护理的未来。
Cancers (Basel). 2025 Feb 20;17(5):722. doi: 10.3390/cancers17050722.
3
autoimmune hepatitis? - Summary of the 5 international autoimmune hepatitis group research workshop 2024.
自身免疫性肝炎?——2024年第五届国际自身免疫性肝炎小组研究研讨会综述
JHEP Rep. 2024 Nov 12;7(2):101265. doi: 10.1016/j.jhepr.2024.101265. eCollection 2025 Feb.
4
Use of artificial intelligence for liver diseases: A survey from the EASL congress 2024.人工智能在肝脏疾病中的应用:来自2024年欧洲肝脏研究学会大会的一项调查。
JHEP Rep. 2024 Sep 6;6(12):101209. doi: 10.1016/j.jhepr.2024.101209. eCollection 2024 Dec.