Al-Ashwal Fahmi Y, Zawiah Mohammed, Gharaibeh Lobna, Abu-Farha Rana, Bitar Ahmad Naoras
Department of Clinical Pharmacy and Pharmacy Practice, Faculty of Pharmacy, University of Science and Technology, Sana'a, Yemen.
College of Pharmacy, Al-Ayen University, Thi-Qar, Iraq.
Drug Healthc Patient Saf. 2023 Sep 20;15:137-147. doi: 10.2147/DHPS.S425858. eCollection 2023.
AI platforms are equipped with advanced algorithms that have the potential to offer a wide range of applications in healthcare services. However, information about the accuracy of AI chatbots against conventional drug-drug interaction tools is limited. This study aimed to assess the sensitivity, specificity, and accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard in predicting drug-drug interactions.
AI-based chatbots (ie, ChatGPT-3.5, ChatGPT-4, Microsoft Bing AI, and Google Bard) were compared for their abilities to detect clinically relevant DDIs for 255 drug pairs. Descriptive statistics, such as specificity, sensitivity, accuracy, negative predictive value (NPV), and positive predictive value (PPV), were calculated for each tool.
When a subscription tool was used as a reference, the specificity ranged from a low of 0.372 (ChatGPT-3.5) to a high of 0.769 (Microsoft Bing AI). Also, Microsoft Bing AI had the highest performance with an accuracy score of 0.788, with ChatGPT-3.5 having the lowest accuracy rate of 0.469. There was an overall improvement in performance for all the programs when the reference tool switched to a free DDI source, but still, ChatGPT-3.5 had the lowest specificity (0.392) and accuracy (0.525), and Microsoft Bing AI demonstrated the highest specificity (0.892) and accuracy (0.890). When assessing the consistency of accuracy across two different drug classes, ChatGPT-3.5 and ChatGPT-4 showed the highest variability in accuracy. In addition, ChatGPT-3.5, ChatGPT-4, and Bard exhibited the highest fluctuations in specificity when analyzing two medications belonging to the same drug class.
Bing AI had the highest accuracy and specificity, outperforming Google's Bard, ChatGPT-3.5, and ChatGPT-4. The findings highlight the significant potential these AI tools hold in transforming patient care. While the current AI platforms evaluated are not without limitations, their ability to quickly analyze potentially significant interactions with good sensitivity suggests a promising step towards improved patient safety.
人工智能平台配备了先进算法,有潜力在医疗服务中提供广泛应用。然而,关于人工智能聊天机器人相对于传统药物相互作用工具准确性的信息有限。本研究旨在评估ChatGPT-3.5、ChatGPT-4、必应人工智能和巴德在预测药物相互作用方面的敏感性、特异性和准确性。
比较基于人工智能的聊天机器人(即ChatGPT-3.5、ChatGPT-4、微软必应人工智能和谷歌巴德)检测255对药物临床相关药物相互作用的能力。计算每个工具的描述性统计量,如特异性、敏感性、准确性、阴性预测值(NPV)和阳性预测值(PPV)。
当使用订阅工具作为参考时,特异性范围从低的0.372(ChatGPT-3.5)到高的0.769(微软必应人工智能)。此外,微软必应人工智能表现最佳,准确率为0.788,ChatGPT-3.5的准确率最低,为0.469。当参考工具切换到免费药物相互作用来源时,所有程序的性能总体上有所提高,但ChatGPT-3.5的特异性(0.392)和准确性(0.525)仍然最低,微软必应人工智能的特异性(0.892)和准确性(0.890)最高。在评估两种不同药物类别准确性的一致性时,ChatGPT-3.5和ChatGPT-4的准确性变化最大。此外,在分析属于同一药物类别的两种药物时,ChatGPT-3.5、ChatGPT-4和巴德的特异性波动最大。
必应人工智能具有最高的准确性和特异性,优于谷歌的巴德、ChatGPT-3.5和ChatGPT-4。研究结果凸显了这些人工智能工具在改变患者护理方面的巨大潜力。虽然目前评估的人工智能平台并非没有局限性,但它们能够以良好的敏感性快速分析潜在的重要相互作用,这表明朝着提高患者安全性迈出了有希望的一步。