Suppr超能文献

四种聊天机器人在自身免疫性肝病中的评估:一项比较分析。

Evaluation of four chatbots in autoimmune liver disease: A comparative analysis.

作者信息

Daza Jimmy, Bezerra Lucas Soares, Santamaría Laura, Rueda-Esteban Roberto, Bantel Heike, Girala Marcos, Ebert Matthias, Van Bömmel Florian, Geier Andreas, Aldana Andres Gomez, Yau Kevin, Alvares-da-Silva Mario, Peck-Radosavljevic Markus, Ridruejo Ezequiel, Weinmann Arndt, Teufel Andreas

机构信息

Division of Hepatology, Division of Clinical Bioinformatics, Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.

Universidad de Los Andes School of Medicine, Bogotá, Colombia.

出版信息

Ann Hepatol. 2024 Aug 13;30(1):101537. doi: 10.1016/j.aohep.2024.101537.

Abstract

INTRODUCTION AND OBJECTIVES

Autoimmune liver diseases (AILDs) are rare and require precise evaluation, which is often challenging for medical providers. Chatbots are innovative solutions to assist healthcare professionals in clinical management. In our study, ten liver specialists systematically evaluated four chatbots to determine their utility as clinical decision support tools in the field of AILDs.

MATERIALS AND METHODS

We constructed a 56-question questionnaire focusing on AILD evaluation, diagnosis, and management of Autoimmune Hepatitis (AIH), Primary Biliary Cholangitis (PBC), and Primary Sclerosing Cholangitis (PSC). Four chatbots -ChatGPT 3.5, Claude, Microsoft Copilot, and Google Bard- were presented with the questions in their free tiers in December 2023. Responses underwent critical evaluation by ten liver specialists using a standardized 1 to 10 Likert scale. The analysis included mean scores, the number of highest-rated replies, and the identification of common shortcomings in chatbots performance.

RESULTS

Among the assessed chatbots, specialists rated Claude highest with a mean score of 7.37 (SD = 1.91), followed by ChatGPT (7.17, SD = 1.89), Microsoft Copilot (6.63, SD = 2.10), and Google Bard (6.52, SD = 2.27). Claude also excelled with 27 best-rated replies, outperforming ChatGPT (20), while Microsoft Copilot and Google Bard lagged with only 6 and 9, respectively. Common deficiencies included listing details over specific advice, limited dosing options, inaccuracies for pregnant patients, insufficient recent data, over-reliance on CT and MRI imaging, and inadequate discussion regarding off-label use and fibrates in PBC treatment. Notably, internet access for Microsoft Copilot and Google Bard did not enhance precision compared to pre-trained models.

CONCLUSIONS

Chatbots hold promise in AILD support, but our study underscores key areas for improvement. Refinement is needed in providing specific advice, accuracy, and focused up-to-date information. Addressing these shortcomings is essential for enhancing the utility of chatbots in AILD management, guiding future development, and ensuring their effectiveness as clinical decision-support tools.

摘要

引言与目的

自身免疫性肝病(AILD)较为罕见,需要进行精确评估,这对医疗服务提供者来说往往具有挑战性。聊天机器人是协助医疗保健专业人员进行临床管理的创新解决方案。在我们的研究中,十位肝脏专家系统地评估了四个聊天机器人,以确定它们在AILD领域作为临床决策支持工具的效用。

材料与方法

我们构建了一份包含56个问题的问卷,重点关注自身免疫性肝炎(AIH)、原发性胆汁性胆管炎(PBC)和原发性硬化性胆管炎(PSC)的评估、诊断和管理。2023年12月,向四个聊天机器人——ChatGPT 3.5、Claude、Microsoft Copilot和Google Bard——以其免费层级呈现这些问题。十位肝脏专家使用标准化的1至10李克特量表对回复进行严格评估。分析包括平均得分、最高评分回复的数量以及聊天机器人性能中常见缺点的识别。

结果

在评估的聊天机器人中,专家对Claude的评分最高,平均得分为7.37(标准差 = 1.91),其次是ChatGPT(7.17,标准差 = 1.89)、Microsoft Copilot(6.63,标准差 = 2.10)和Google Bard(6.52,标准差 = 2.27)。Claude在最佳评分回复方面也表现出色,有27个,超过了ChatGPT(20个),而Microsoft Copilot和Google Bard则分别只有6个和9个,表现滞后。常见不足包括列出细节多于提供具体建议、给药选项有限、对孕妇的信息不准确、近期数据不足、过度依赖CT和MRI成像以及在PBC治疗中关于标签外使用和贝特类药物的讨论不足。值得注意的是,与预训练模型相比,Microsoft Copilot和Google Bard的联网功能并未提高准确性。

结论

聊天机器人在AILD支持方面具有潜力,但我们的研究强调了需要改进的关键领域。在提供具体建议、准确性和聚焦的最新信息方面需要改进。解决这些缺点对于提高聊天机器人在AILD管理中的效用、指导未来发展以及确保其作为临床决策支持工具的有效性至关重要。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验