评估ChatGPT-3.5、ChatGPT-4、必应人工智能和巴德相对于传统药物相互作用临床工具的敏感性、特异性和准确性。

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools.

作者信息

Al-Ashwal Fahmi Y, Zawiah Mohammed, Gharaibeh Lobna, Abu-Farha Rana, Bitar Ahmad Naoras

机构信息

Department of Clinical Pharmacy and Pharmacy Practice, Faculty of Pharmacy, University of Science and Technology, Sana'a, Yemen.

College of Pharmacy, Al-Ayen University, Thi-Qar, Iraq.

出版信息

Drug Healthc Patient Saf. 2023 Sep 20;15:137-147. doi: 10.2147/DHPS.S425858. eCollection 2023.

DOI:10.2147/DHPS.S425858

PMID:37750052

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10518176/

Abstract

BACKGROUND

AI platforms are equipped with advanced ‎algorithms that have the potential to offer a wide range of ‎applications in healthcare services. However, information about the accuracy of AI chatbots against ‎conventional drug-drug interaction tools is limited‎. This study aimed to assess the sensitivity, specificity, and accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard in predicting drug-drug interactions.

METHODS

AI-based chatbots (ie, ChatGPT-3.5, ChatGPT-4, Microsoft Bing AI, and Google Bard) were compared for their abilities to detect clinically relevant DDIs for 255 drug pairs. Descriptive statistics, such as specificity, sensitivity, accuracy, negative predictive value (NPV), and positive predictive value (PPV), were calculated for each tool.

RESULTS

When a subscription tool was used as a reference, the specificity ranged from a low of 0.372 (ChatGPT-3.5) to a high of 0.769 (Microsoft Bing AI). Also, Microsoft Bing AI had the highest performance with an accuracy score of 0.788, with ChatGPT-3.5 having the lowest accuracy rate of 0.469. There was an overall improvement in performance for all the programs when the reference tool switched to a free DDI source, but still, ChatGPT-3.5 had the lowest specificity (0.392) and accuracy (0.525), and Microsoft Bing AI demonstrated the highest specificity (0.892) and accuracy (0.890). When assessing the consistency of accuracy across two different drug classes, ChatGPT-3.5 and ChatGPT-4 showed the highest ‎variability in accuracy. In addition, ChatGPT-3.5, ChatGPT-4, and Bard exhibited the highest ‎fluctuations in specificity when analyzing two medications belonging to the same drug class.

CONCLUSION

Bing AI had the highest accuracy and specificity, outperforming Google's Bard, ChatGPT-3.5, and ChatGPT-4. The findings highlight the significant potential these AI tools hold in transforming patient care. While the current AI platforms evaluated are not without limitations, their ability to quickly analyze potentially significant interactions with good sensitivity suggests a promising step towards improved patient safety.

摘要

背景

人工智能平台配备了先进算法，有潜力在医疗服务中提供广泛应用。然而，关于人工智能聊天机器人相对于传统药物相互作用工具准确性的信息有限。本研究旨在评估ChatGPT-3.5、ChatGPT-4、必应人工智能和巴德在预测药物相互作用方面的敏感性、特异性和准确性。

方法

比较基于人工智能的聊天机器人（即ChatGPT-3.5、ChatGPT-4、微软必应人工智能和谷歌巴德）检测255对药物临床相关药物相互作用的能力。计算每个工具的描述性统计量，如特异性、敏感性、准确性、阴性预测值（NPV）和阳性预测值（PPV）。

结果

当使用订阅工具作为参考时，特异性范围从低的0.372（ChatGPT-3.5）到高的0.769（微软必应人工智能）。此外，微软必应人工智能表现最佳，准确率为0.788，ChatGPT-3.5的准确率最低，为0.469。当参考工具切换到免费药物相互作用来源时，所有程序的性能总体上有所提高，但ChatGPT-3.5的特异性（0.392）和准确性（0.525）仍然最低，微软必应人工智能的特异性（0.892）和准确性（0.890）最高。在评估两种不同药物类别准确性的一致性时，ChatGPT-3.5和ChatGPT-4的准确性变化最大。此外，在分析属于同一药物类别的两种药物时，ChatGPT-3.5、ChatGPT-4和巴德的特异性波动最大。

结论

必应人工智能具有最高的准确性和特异性，优于谷歌的巴德、ChatGPT-3.5和ChatGPT-4。研究结果凸显了这些人工智能工具在改变患者护理方面的巨大潜力。虽然目前评估的人工智能平台并非没有局限性，但它们能够以良好的敏感性快速分析潜在的重要相互作用，这表明朝着提高患者安全性迈出了有希望的一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/551e/10518176/6031cecbddb8/DHPS-15-137-g0001.jpg

相似文献

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools.评估ChatGPT-3.5、ChatGPT-4、必应人工智能和巴德相对于传统药物相互作用临床工具的敏感性、特异性和准确性。

Drug Healthc Patient Saf. 2023 Sep 20;15:137-147. doi: 10.2147/DHPS.S425858. eCollection 2023.

Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。

Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.

Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing.大语言模型在血液学病例解决中的应用：ChatGPT-3.5、谷歌巴德和微软必应的比较研究

Cureus. 2023 Aug 21;15(8):e43861. doi: 10.7759/cureus.43861. eCollection 2023 Aug.

Comparing the Efficacy of Large Language Models ChatGPT, BARD, and Bing AI in Providing Information on Rhinoplasty: An Observational Study.比较大型语言模型ChatGPT、BARD和必应人工智能在提供隆鼻信息方面的功效：一项观察性研究。

Aesthet Surg J Open Forum. 2023 Sep 14;5:ojad084. doi: 10.1093/asjof/ojad084. eCollection 2023.

Efficacy of AI Chats to Determine an Emergency: A Comparison Between OpenAI's ChatGPT, Google Bard, and Microsoft Bing AI Chat.人工智能聊天工具在判定紧急情况方面的效能：OpenAI的ChatGPT、谷歌巴德和微软必应人工智能聊天工具的比较

Cureus. 2023 Sep 18;15(9):e45473. doi: 10.7759/cureus.45473. eCollection 2023 Sep.

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力：ChatGPT、谷歌巴德和微软必应的比较研究

Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.

Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions.人类与人工智能：ChatGPT-4在临床化学选择题方面表现优于必应、巴德、ChatGPT-3.5和人类。

Adv Med Educ Pract. 2024 Sep 20;15:857-871. doi: 10.2147/AMEP.S479801. eCollection 2024.

AI-Powered Renal Diet Support: Performance of ChatGPT, Bard AI, and Bing Chat.人工智能驱动的肾脏饮食支持：ChatGPT、Bard AI和必应聊天的性能

Clin Pract. 2023 Sep 26;13(5):1160-1172. doi: 10.3390/clinpract13050104.

Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology.大语言模型（ChatGPT、必应搜索和谷歌巴德）在解决生理学病例 vignettes 中的表现。

Cureus. 2023 Aug 4;15(8):e42972. doi: 10.7759/cureus.42972. eCollection 2023 Aug.

Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology.分析ChatGPT、Bard和必应在医学生理学中生成基于推理的多项选择题的适用性。

Cureus. 2023 Jun 26;15(6):e40977. doi: 10.7759/cureus.40977. eCollection 2023 Jun.

引用本文的文献

ReviewGenie: a novel automated system for systematic reviews-an exploratory study in speech and language disorders.ReviewGenie：一种用于系统评价的新型自动化系统——言语和语言障碍的探索性研究

Syst Rev. 2025 Aug 18;14(1):167. doi: 10.1186/s13643-025-02895-z.

Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems: Mixed Methods Study.生成式人工智能系统解决临床药学问题的比较分析：混合方法研究

JMIR Med Inform. 2025 Jul 24;13:e76128. doi: 10.2196/76128.

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果：一项观察性研究的内容分析

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

Clinical applications of large language models in medicine and surgery: A scoping review.大型语言模型在医学与外科中的临床应用：一项范围综述

J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4.

AI-driven healthcare innovations for enhancing clinical services during mass gatherings (Hajj): task force insights and future directions.人工智能驱动的医疗创新，用于在大规模集会（朝觐）期间提升临床服务：特别工作组的见解与未来方向

BMC Health Serv Res. 2025 Jul 1;25(1):876. doi: 10.1186/s12913-025-13045-5.

HIV Prevention and Treatment Information from Four Artificial Intelligence Platforms: A Thematic Analysis.来自四个人工智能平台的HIV预防与治疗信息：一项主题分析。

AIDS Behav. 2025 Jun 7. doi: 10.1007/s10461-025-04786-9.

Can large language models detect drug-drug interactions leading to adverse drug reactions?大语言模型能否检测出导致药物不良反应的药物相互作用？

Ther Adv Drug Saf. 2025 May 16;16:20420986251339358. doi: 10.1177/20420986251339358. eCollection 2025.

Performance of the Large Language Models in African rheumatology: a diagnostic test accuracy study of ChatGPT-4, Gemini, Copilot, and Claude artificial intelligence.大语言模型在非洲风湿病学中的表现：ChatGPT-4、Gemini、Copilot和Claude人工智能的诊断测试准确性研究

BMC Rheumatol. 2025 May 16;9(1):54. doi: 10.1186/s41927-025-00512-z.

A scoping review on generative AI and large language models in mitigating medication related harm.关于生成式人工智能和大语言模型在减轻药物相关危害方面的范围综述。

NPJ Digit Med. 2025 Mar 28;8(1):182. doi: 10.1038/s41746-025-01565-7.

The role of generative AI tools in shaping mechanical engineering education from an undergraduate perspective.从本科教育角度看生成式人工智能工具在机械工程教育塑造中的作用。

Sci Rep. 2025 Mar 17;15(1):9214. doi: 10.1038/s41598-025-93871-z.

本文引用的文献

Can large language models reason about medical questions?大型语言模型能对医学问题进行推理吗？

Patterns (N Y). 2024 Mar 1;5(3):100943. doi: 10.1016/j.patter.2024.100943. eCollection 2024 Mar 8.

Artificial intelligence and obesity management: An Obesity Medicine Association (OMA) Clinical Practice Statement (CPS) 2023.人工智能与肥胖管理：肥胖医学协会（OMA）2023年临床实践声明（CPS）

Obes Pillars. 2023 Apr 20;6:100065. doi: 10.1016/j.obpill.2023.100065. eCollection 2023 Jun.

The Chatbots Are Invading Us: A Map Point on the Evolution, Applications, Opportunities, and Emerging Problems in the .聊天机器人正在入侵我们：关于其在……方面的演变、应用、机遇和新出现问题的一个映射点。你提供的原文似乎不完整，最后的“in the.”后面缺少具体内容。

Life (Basel). 2023 May 5;13(5):1130. doi: 10.3390/life13051130.

Computational Intelligence in Cancer Diagnostics: A Contemporary Review of Smart Phone Apps, Current Problems, and Future Research Potentials.癌症诊断中的计算智能：智能手机应用、当前问题及未来研究潜力的当代综述

Diagnostics (Basel). 2023 Apr 27;13(9):1563. doi: 10.3390/diagnostics13091563.

Comparative analysis of large language models in the Royal College of Ophthalmologists fellowship exams.皇家眼科医学院院士资格考试中大型语言模型的对比分析

Eye (Lond). 2023 Dec;37(17):3530-3533. doi: 10.1038/s41433-023-02563-3. Epub 2023 May 9.

COVID-19 Scientific Literacy in Medical and Nursing Students.医学生和护理学生对新冠病毒病的科学认知

J Microbiol Biol Educ. 2023 Mar 16;24(1). doi: 10.1128/jmbe.00219-22. eCollection 2023 Apr.

The Capability of ChatGPT in Predicting and Explaining Common Drug-Drug Interactions.ChatGPT在预测和解释常见药物相互作用方面的能力。

Cureus. 2023 Mar 17;15(3):e36272. doi: 10.7759/cureus.36272. eCollection 2023 Mar.

Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum.根据基于能力的医学教育课程评估ChatGPT回答微生物学一阶和二阶知识问题的能力。

Cureus. 2023 Mar 12;15(3):e36034. doi: 10.7759/cureus.36034. eCollection 2023 Mar.

Harnessing the Power of Precision Medicine and Novel Biomarkers to Treat Crohn's Disease.利用精准医学和新型生物标志物的力量治疗克罗恩病。

J Clin Med. 2023 Apr 4;12(7):2696. doi: 10.3390/jcm12072696.

ChatGPT: Is this version good for healthcare and research?ChatGPT：这个版本对医疗保健和研究有帮助吗？

Diabetes Metab Syndr. 2023 Apr;17(4):102744. doi: 10.1016/j.dsx.2023.102744. Epub 2023 Mar 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估ChatGPT-3.5、ChatGPT-4、必应人工智能和巴德相对于传统药物相互作用临床工具的敏感性、特异性和准确性。

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献