Ramasubramanian Swaminathan, Balaji Sangeetha, Kannan Tejashri, Jeyaraman Naveen, Sharma Shilpa, Migliorini Filippo, Balasubramaniam Suhasini, Jeyaraman Madhan
Department of Orthopaedics, Government Medical College, Omandurar Government Estate, Chennai 600002, Tamil Nadu, India.
Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India.
World J Methodol. 2024 Dec 20;14(4):92802. doi: 10.5662/wjm.v14.i4.92802.
Medication errors, especially in dosage calculation, pose risks in healthcare. Artificial intelligence (AI) systems like ChatGPT and Google Bard may help reduce errors, but their accuracy in providing medication information remains to be evaluated.
To evaluate the accuracy of AI systems (ChatGPT 3.5, ChatGPT 4, Google Bard) in providing drug dosage information per Harrison's Principles of Internal Medicine.
A set of natural language queries mimicking real-world medical dosage inquiries was presented to the AI systems. Responses were analyzed using a 3-point Likert scale. The analysis, conducted with Python and its libraries, focused on basic statistics, overall system accuracy, and disease-specific and organ system accuracies.
ChatGPT 4 outperformed the other systems, showing the highest rate of correct responses (83.77%) and the best overall weighted accuracy (0.6775). Disease-specific accuracy varied notably across systems, with some diseases being accurately recognized, while others demonstrated significant discrepancies. Organ system accuracy also showed variable results, underscoring system-specific strengths and weaknesses.
ChatGPT 4 demonstrates superior reliability in medical dosage information, yet variations across diseases emphasize the need for ongoing improvements. These results highlight AI's potential in aiding healthcare professionals, urging continuous development for dependable accuracy in critical medical situations.
用药错误,尤其是剂量计算方面的错误,在医疗保健中存在风险。像ChatGPT和谷歌巴德这样的人工智能(AI)系统可能有助于减少错误,但其提供用药信息的准确性仍有待评估。
根据《哈里森内科学原理》评估人工智能系统(ChatGPT 3.5、ChatGPT 4、谷歌巴德)提供药物剂量信息的准确性。
向人工智能系统提出一组模拟现实世界医疗剂量查询的自然语言问题。使用3点李克特量表对回答进行分析。使用Python及其库进行的分析侧重于基本统计、整体系统准确性以及特定疾病和器官系统的准确性。
ChatGPT 4的表现优于其他系统,正确回答率最高(83.77%),整体加权准确率最佳(0.6775)。不同系统之间特定疾病的准确性差异显著,有些疾病能被准确识别,而其他疾病则存在明显差异。器官系统的准确性也呈现出不同的结果,凸显了各系统的优势和劣势。
ChatGPT 4在医疗剂量信息方面表现出卓越的可靠性,但不同疾病之间的差异表明仍需不断改进。这些结果凸显了人工智能在协助医疗专业人员方面的潜力,促使在关键医疗情况下持续发展以实现可靠的准确性。