Suppr超能文献

人工智能聊天机器人在免疫相关不良事件临床管理中的应用。

Use of artificial intelligence chatbots in clinical management of immune-related adverse events.

机构信息

Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

Department of Oncology, Johns Hopkins University, Baltimore, Maryland, USA.

出版信息

J Immunother Cancer. 2024 May 30;12(5):e008599. doi: 10.1136/jitc-2023-008599.

Abstract

BACKGROUND

Artificial intelligence (AI) chatbots have become a major source of general and medical information, though their accuracy and completeness are still being assessed. Their utility to answer questions surrounding immune-related adverse events (irAEs), common and potentially dangerous toxicities from cancer immunotherapy, are not well defined.

METHODS

We developed 50 distinct questions with answers in available guidelines surrounding 10 irAE categories and queried two AI chatbots (ChatGPT and Bard), along with an additional 20 patient-specific scenarios. Experts in irAE management scored answers for accuracy and completion using a Likert scale ranging from 1 (least accurate/complete) to 4 (most accurate/complete). Answers across categories and across engines were compared.

RESULTS

Overall, both engines scored highly for accuracy (mean scores for ChatGPT and Bard were 3.87 vs 3.5, p<0.01) and completeness (3.83 vs 3.46, p<0.01). Scores of 1-2 (completely or mostly inaccurate or incomplete) were particularly rare for ChatGPT (6/800 answer-ratings, 0.75%). Of the 50 questions, all eight physician raters gave ChatGPT a rating of 4 (fully accurate or complete) for 22 questions (for accuracy) and 16 questions (for completeness). In the 20 patient scenarios, the average accuracy score was 3.725 (median 4) and the average completeness was 3.61 (median 4).

CONCLUSIONS

AI chatbots provided largely accurate and complete information regarding irAEs, and wildly inaccurate information ("hallucinations") was uncommon. However, until accuracy and completeness increases further, appropriate guidelines remain the gold standard to follow.

摘要

背景

人工智能(AI)聊天机器人已成为获取一般和医学信息的主要来源,但它们的准确性和完整性仍在评估中。其在回答与免疫相关的不良反应(irAE)相关问题方面的效用,即癌症免疫疗法常见且潜在危险的毒性问题,尚未得到明确界定。

方法

我们围绕 10 个 irAE 类别开发了 50 个具有答案的不同问题,并查询了两个 AI 聊天机器人(ChatGPT 和 Bard),以及另外 20 个患者特定场景。irAE 管理专家使用 1 到 4 的李克特量表(1 表示最不准确/不完整,4 表示最准确/完整)对答案的准确性和完整性进行评分。比较了各个类别和各个引擎的答案。

结果

总体而言,两个引擎的准确性得分都很高(ChatGPT 和 Bard 的平均得分分别为 3.87 和 3.5,p<0.01),完整性得分也很高(3.83 和 3.46,p<0.01)。ChatGPT 的 1-2 分(完全或主要不准确或不完整)评分非常少见(6/800 回答评分,0.75%)。在 50 个问题中,所有 8 位医生评分者都对 ChatGPT 的 22 个问题(准确性)和 16 个问题(完整性)给予了 4 分(完全准确或完整)的评分。在 20 个患者场景中,平均准确性得分为 3.725(中位数 4),平均完整性得分为 3.61(中位数 4)。

结论

AI 聊天机器人提供了关于 irAE 的大部分准确和完整信息,且极不准确的信息(“幻觉”)并不常见。然而,在准确性和完整性进一步提高之前,适当的指南仍然是遵循的黄金标准。

相似文献

引用本文的文献

2
Large language models in oncology: a review.肿瘤学中的大语言模型:综述
BMJ Oncol. 2025 May 15;4(1):e000759. doi: 10.1136/bmjonc-2025-000759. eCollection 2025.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验