Suppr超能文献

ChatGPT-4和Bard聊天机器人在回答关于前列腺癌Lu-PSMA-617疗法常见患者问题方面的表现

Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer Lu-PSMA-617 therapy.

作者信息

Belge Bilgin Gokce, Bilgin Cem, Childs Daniel S, Orme Jacob J, Burkett Brian J, Packard Ann T, Johnson Derek R, Thorpe Matthew P, Riaz Irbaz Bin, Halfdanarson Thorvardur R, Johnson Geoffrey B, Sartor Oliver, Kendi Ayse Tuba

机构信息

Department of Radiology, Mayo Clinic, Rochester, MN, United States.

Division of Medical Oncology, Department of Oncology, Mayo Clinic, Rochester, MN, United States.

出版信息

Front Oncol. 2024 Jul 12;14:1386718. doi: 10.3389/fonc.2024.1386718. eCollection 2024.

Abstract

BACKGROUND

Many patients use artificial intelligence (AI) chatbots as a rapid source of health information. This raises important questions about the reliability and effectiveness of AI chatbots in delivering accurate and understandable information.

PURPOSE

To evaluate and compare the accuracy, conciseness, and readability of responses from OpenAI ChatGPT-4 and Google Bard to patient inquiries concerning the novel Lu-PSMA-617 therapy for prostate cancer.

MATERIALS AND METHODS

Two experts listed the 12 most commonly asked questions by patients on Lu-PSMA-617 therapy. These twelve questions were prompted to OpenAI ChatGPT-4 and Google Bard. AI-generated responses were distributed using an online survey platform (Qualtrics) and blindly rated by eight experts. The performances of the AI chatbots were evaluated and compared across three domains: accuracy, conciseness, and readability. Additionally, potential safety concerns associated with AI-generated answers were also examined. The Mann-Whitney U and chi-square tests were utilized to compare the performances of AI chatbots.

RESULTS

Eight experts participated in the survey, evaluating 12 AI-generated responses across the three domains of accuracy, conciseness, and readability, resulting in 96 assessments (12 responses x 8 experts) for each domain per chatbot. ChatGPT-4 provided more accurate answers than Bard (2.95 ± 0.671 vs 2.73 ± 0.732, =0.027). Bard's responses had better readability than ChatGPT-4 (2.79 ± 0.408 vs 2.94 ± 0.243, =0.003). Both ChatGPT-4 and Bard achieved comparable conciseness scores (3.14 ± 0.659 vs 3.11 ± 0.679, =0.798). Experts categorized the AI-generated responses as incorrect or partially correct at a rate of 16.6% for ChatGPT-4 and 29.1% for Bard. Bard's answers contained significantly more misleading information than those of ChatGPT-4 ( = 0.039).

CONCLUSION

AI chatbots have gained significant attention, and their performance is continuously improving. Nonetheless, these technologies still need further improvements to be considered reliable and credible sources for patients seeking medical information on Lu-PSMA-617 therapy.

摘要

背景

许多患者将人工智能(AI)聊天机器人作为快速获取健康信息的来源。这引发了关于AI聊天机器人在提供准确且易懂信息方面的可靠性和有效性的重要问题。

目的

评估并比较OpenAI ChatGPT-4和谷歌巴德(Google Bard)对患者有关新型镥-PSMA-617前列腺癌疗法的询问所给出回答的准确性、简洁性和可读性。

材料与方法

两位专家列出了患者关于镥-PSMA-617疗法最常问的12个问题。这12个问题被输入到OpenAI ChatGPT-4和谷歌巴德中。通过在线调查平台(Qualtrics)分发AI生成的回答,并由八位专家进行盲评。在准确性、简洁性和可读性这三个领域评估并比较AI聊天机器人的表现。此外,还检查了与AI生成答案相关的潜在安全问题。采用曼-惠特尼U检验和卡方检验来比较AI聊天机器人的表现。

结果

八位专家参与了调查,对12个AI生成的回答在准确性、简洁性和可读性这三个领域进行评估,每个聊天机器人每个领域产生96项评估结果(12个回答×8位专家)。ChatGPT-4提供的答案比巴德更准确(2.95±0.671对2.73±0.732,P=0.027)。巴德的回答比ChatGPT-4具有更好的可读性(2.79±0.408对2.94±0.243,P=0.003)。ChatGPT-4和巴德的简洁性得分相当(3.14±0.659对3.11±0.679,P=0.798)。专家将ChatGPT-4生成的回答归类为错误或部分正确的比例为16.6%,巴德为29.1%。巴德的答案中包含误导性信息的比例显著高于ChatGPT-4(P=0.039)。

结论

AI聊天机器人已获得显著关注,其性能也在不断提升。尽管如此,对于寻求镥-PSMA-617疗法医疗信息的患者而言,这些技术仍需进一步改进,才能被视为可靠且可信的信息来源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/744e/11272524/e53ef3a5b42e/fonc-14-1386718-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验