文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

人工智能聊天机器人在免疫相关不良事件临床管理中的应用。

Use of artificial intelligence chatbots in clinical management of immune-related adverse events.

机构信息

Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

Department of Oncology, Johns Hopkins University, Baltimore, Maryland, USA.

出版信息

J Immunother Cancer. 2024 May 30;12(5):e008599. doi: 10.1136/jitc-2023-008599.


DOI:10.1136/jitc-2023-008599
PMID:38816231
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11141185/
Abstract

BACKGROUND: Artificial intelligence (AI) chatbots have become a major source of general and medical information, though their accuracy and completeness are still being assessed. Their utility to answer questions surrounding immune-related adverse events (irAEs), common and potentially dangerous toxicities from cancer immunotherapy, are not well defined. METHODS: We developed 50 distinct questions with answers in available guidelines surrounding 10 irAE categories and queried two AI chatbots (ChatGPT and Bard), along with an additional 20 patient-specific scenarios. Experts in irAE management scored answers for accuracy and completion using a Likert scale ranging from 1 (least accurate/complete) to 4 (most accurate/complete). Answers across categories and across engines were compared. RESULTS: Overall, both engines scored highly for accuracy (mean scores for ChatGPT and Bard were 3.87 vs 3.5, p<0.01) and completeness (3.83 vs 3.46, p<0.01). Scores of 1-2 (completely or mostly inaccurate or incomplete) were particularly rare for ChatGPT (6/800 answer-ratings, 0.75%). Of the 50 questions, all eight physician raters gave ChatGPT a rating of 4 (fully accurate or complete) for 22 questions (for accuracy) and 16 questions (for completeness). In the 20 patient scenarios, the average accuracy score was 3.725 (median 4) and the average completeness was 3.61 (median 4). CONCLUSIONS: AI chatbots provided largely accurate and complete information regarding irAEs, and wildly inaccurate information ("hallucinations") was uncommon. However, until accuracy and completeness increases further, appropriate guidelines remain the gold standard to follow.

摘要

背景:人工智能(AI)聊天机器人已成为获取一般和医学信息的主要来源,但它们的准确性和完整性仍在评估中。其在回答与免疫相关的不良反应(irAE)相关问题方面的效用,即癌症免疫疗法常见且潜在危险的毒性问题,尚未得到明确界定。

方法:我们围绕 10 个 irAE 类别开发了 50 个具有答案的不同问题,并查询了两个 AI 聊天机器人(ChatGPT 和 Bard),以及另外 20 个患者特定场景。irAE 管理专家使用 1 到 4 的李克特量表(1 表示最不准确/不完整,4 表示最准确/完整)对答案的准确性和完整性进行评分。比较了各个类别和各个引擎的答案。

结果:总体而言,两个引擎的准确性得分都很高(ChatGPT 和 Bard 的平均得分分别为 3.87 和 3.5,p<0.01),完整性得分也很高(3.83 和 3.46,p<0.01)。ChatGPT 的 1-2 分(完全或主要不准确或不完整)评分非常少见(6/800 回答评分,0.75%)。在 50 个问题中,所有 8 位医生评分者都对 ChatGPT 的 22 个问题(准确性)和 16 个问题(完整性)给予了 4 分(完全准确或完整)的评分。在 20 个患者场景中,平均准确性得分为 3.725(中位数 4),平均完整性得分为 3.61(中位数 4)。

结论:AI 聊天机器人提供了关于 irAE 的大部分准确和完整信息,且极不准确的信息(“幻觉”)并不常见。然而,在准确性和完整性进一步提高之前,适当的指南仍然是遵循的黄金标准。

相似文献

[1]
Use of artificial intelligence chatbots in clinical management of immune-related adverse events.

J Immunother Cancer. 2024-5-30

[2]
Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.

Cureus. 2024-1-2

[3]
Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.

Vascular. 2025-2

[4]
Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer Lu-PSMA-617 therapy.

Front Oncol. 2024-7-12

[5]
The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.

J Bone Miner Res. 2024-3-22

[6]
Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.

Cureus. 2024-3-23

[7]
Artificial intelligence-powered chatbots in search engines: a cross-sectional study on the quality and risks of drug information for patients.

BMJ Qual Saf. 2025-1-28

[8]
The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard.

Am J Orthod Dentofacial Orthop. 2024-6

[9]
Are artificial intelligence chatbots a reliable source of information about contact lenses?

Cont Lens Anterior Eye. 2024-4

[10]
Evaluating the accuracy and reliability of AI chatbots in disseminating the content of current resuscitation guidelines: a comparative analysis between the ERC 2021 guidelines and both ChatGPTs 3.5 and 4.

Scand J Trauma Resusc Emerg Med. 2024-9-26

引用本文的文献

[1]
Evaluating the Accuracy, Completeness, and Readability of Chatbot Responses to Refractive Surgery-Related Patient Questions: A Comparative Analysis of ChatGPT and Google Gemini.

Cureus. 2025-7-29

[2]
Large language models in oncology: a review.

BMJ Oncol. 2025-5-15

[3]
Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.

JMIR Perioper Med. 2025-6-12

[4]
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.

J Med Internet Res. 2025-6-4

[5]
Optimizing Immunotherapy: The Synergy of Immune Checkpoint Inhibitors with Artificial Intelligence in Melanoma Treatment.

Biomolecules. 2025-4-16

[6]
Exploring the capabilities of GenAI for oral cancer consultations in remote consultations : Author.

BMC Oral Health. 2025-2-20

[7]
Use of artificial intelligence chatbots in clinical management of immune-related adverse events.

J Immunother Cancer. 2024-12-4

[8]
Understanding AI's Role in Endometriosis Patient Education and Evaluating Its Information and Accuracy: Systematic Review.

JMIR AI. 2024-10-30

[9]
Performance of Multimodal Artificial Intelligence Chatbots Evaluated on Clinical Oncology Cases.

JAMA Netw Open. 2024-10-1

本文引用的文献

[1]
ChatGPT vs. neurologists: a cross-sectional study investigating preference, satisfaction ratings and perceived empathy in responses among people living with multiple sclerosis.

J Neurol. 2024-7

[2]
Performance of large language models on benign prostatic hyperplasia frequently asked questions.

Prostate. 2024-6

[3]
ChatGPT as a Diagnostic Aid in Alzheimer's Disease: An Exploratory Study.

J Alzheimers Dis Rep. 2024-3-19

[4]
Accuracy of Information given by ChatGPT for Patients with Inflammatory Bowel Disease in Relation to ECCO Guidelines.

J Crohns Colitis. 2024-8-14

[5]
Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study.

Oncologist. 2024-5-3

[6]
Urological Cancers and ChatGPT: Assessing the Quality of Information and Possible Risks for Patients.

Clin Genitourin Cancer. 2024-4

[7]
Exploring the Role of Artificial Intelligence Chatbots in Preoperative Counseling for Head and Neck Cancer Surgery.

Laryngoscope. 2024-6

[8]
Accuracy and Reliability of Chatbot Responses to Physician Questions.

JAMA Netw Open. 2023-10-2

[9]
Use of Artificial Intelligence Chatbots for Cancer Treatment Information.

JAMA Oncol. 2023-10-1

[10]
Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer.

JAMA Oncol. 2023-10-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索