人工智能系统在提供药物剂量方面准确性的比较评估：一项方法学研究。

Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study.

作者信息

Ramasubramanian Swaminathan, Balaji Sangeetha, Kannan Tejashri, Jeyaraman Naveen, Sharma Shilpa, Migliorini Filippo, Balasubramaniam Suhasini, Jeyaraman Madhan

机构信息

Department of Orthopaedics, Government Medical College, Omandurar Government Estate, Chennai 600002, Tamil Nadu, India.

Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India.

出版信息

World J Methodol. 2024 Dec 20;14(4):92802. doi: 10.5662/wjm.v14.i4.92802.

DOI:10.5662/wjm.v14.i4.92802

PMID:39712564

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11287534/

Abstract

BACKGROUND

Medication errors, especially in dosage calculation, pose risks in healthcare. Artificial intelligence (AI) systems like ChatGPT and Google Bard may help reduce errors, but their accuracy in providing medication information remains to be evaluated.

AIM

To evaluate the accuracy of AI systems (ChatGPT 3.5, ChatGPT 4, Google Bard) in providing drug dosage information per Harrison's Principles of Internal Medicine.

METHODS

A set of natural language queries mimicking real-world medical dosage inquiries was presented to the AI systems. Responses were analyzed using a 3-point Likert scale. The analysis, conducted with Python and its libraries, focused on basic statistics, overall system accuracy, and disease-specific and organ system accuracies.

RESULTS

ChatGPT 4 outperformed the other systems, showing the highest rate of correct responses (83.77%) and the best overall weighted accuracy (0.6775). Disease-specific accuracy varied notably across systems, with some diseases being accurately recognized, while others demonstrated significant discrepancies. Organ system accuracy also showed variable results, underscoring system-specific strengths and weaknesses.

CONCLUSION

ChatGPT 4 demonstrates superior reliability in medical dosage information, yet variations across diseases emphasize the need for ongoing improvements. These results highlight AI's potential in aiding healthcare professionals, urging continuous development for dependable accuracy in critical medical situations.

摘要

背景

用药错误，尤其是剂量计算方面的错误，在医疗保健中存在风险。像ChatGPT和谷歌巴德这样的人工智能（AI）系统可能有助于减少错误，但其提供用药信息的准确性仍有待评估。

目的

根据《哈里森内科学原理》评估人工智能系统（ChatGPT 3.5、ChatGPT 4、谷歌巴德）提供药物剂量信息的准确性。

方法

向人工智能系统提出一组模拟现实世界医疗剂量查询的自然语言问题。使用3点李克特量表对回答进行分析。使用Python及其库进行的分析侧重于基本统计、整体系统准确性以及特定疾病和器官系统的准确性。

结果

ChatGPT 4的表现优于其他系统，正确回答率最高（83.77%），整体加权准确率最佳（0.6775）。不同系统之间特定疾病的准确性差异显著，有些疾病能被准确识别，而其他疾病则存在明显差异。器官系统的准确性也呈现出不同的结果，凸显了各系统的优势和劣势。

结论

ChatGPT 4在医疗剂量信息方面表现出卓越的可靠性，但不同疾病之间的差异表明仍需不断改进。这些结果凸显了人工智能在协助医疗专业人员方面的潜力，促使在关键医疗情况下持续发展以实现可靠的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd54/11287534/a8a2287c8c41/92802-g001.jpg

相似文献

Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study.人工智能系统在提供药物剂量方面准确性的比较评估：一项方法学研究。

World J Methodol. 2024 Dec 20;14(4):92802. doi: 10.5662/wjm.v14.i4.92802.

Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性：ChatGPT与谷歌巴德人工智能的比较分析

Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.

Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比：横断面试点研究

JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.

Understanding the Landscape: The Emergence of Artificial Intelligence (AI), ChatGPT, and Google Bard in Gastroenterology.了解现状：人工智能（AI）、ChatGPT和谷歌巴德在胃肠病学领域的兴起。

Cureus. 2024 Jan 8;16(1):e51848. doi: 10.7759/cureus.51848. eCollection 2024 Jan.

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Gemini人工智能与ChatGPT对比：与眼科住院医师一起对医学知识进行的全面考察

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

Beyond the Hype-The Actual Role and Risks of AI in Today's Medical Practice: Comparative-Approach Study.超越炒作——人工智能在当今医学实践中的实际作用和风险：比较研究方法

JMIR AI. 2024 Jan 22;3:e49082. doi: 10.2196/49082.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能：评估 Google Gemini 和 ChatGPT-4o。

Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.

Comparative Analysis of Artificial Intelligence Virtual Assistant and Large Language Models in Post-Operative Care.人工智能虚拟助手与大语言模型在术后护理中的对比分析

Eur J Investig Health Psychol Educ. 2024 May 15;14(5):1413-1424. doi: 10.3390/ejihpe14050093.

The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.人工智能聊天机器人大型语言模型在解决骨骼生物学和骨骼健康问题方面的表现。

J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性：公众需谨慎。

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

本文引用的文献

Evaluation of prompt engineering strategies for pharmacokinetic data analysis with the ChatGPT large language model.评估 ChatGPT 大型语言模型在药代动力学数据分析中的提示工程策略。

J Pharmacokinet Pharmacodyn. 2024 Apr;51(2):101-108. doi: 10.1007/s10928-023-09892-6. Epub 2023 Nov 11.

Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers.揭开ChatGPT现象的面纱：评估牙髓病学问题答案的一致性和准确性。

Int Endod J. 2024 Jan;57(1):108-113. doi: 10.1111/iej.13985. Epub 2023 Oct 9.

ChatGPT in pharmacy practice: a cross-sectional exploration of Jordanian pharmacists' perception, practice, and concerns.ChatGPT在药学实践中的应用：对约旦药剂师认知、实践及担忧的横断面研究

J Pharm Policy Pract. 2023 Oct 3;16(1):115. doi: 10.1186/s40545-023-00624-2.

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools.评估ChatGPT-3.5、ChatGPT-4、必应人工智能和巴德相对于传统药物相互作用临床工具的敏感性、特异性和准确性。

Drug Healthc Patient Saf. 2023 Sep 20;15:137-147. doi: 10.2147/DHPS.S425858. eCollection 2023.

ChatGPT: promise and challenges for deployment in low- and middle-income countries.ChatGPT：在低收入和中等收入国家部署的前景与挑战。

Lancet Reg Health West Pac. 2023 Sep 15;41:100905. doi: 10.1016/j.lanwpc.2023.100905. eCollection 2023 Dec.

Trends in Accuracy and Appropriateness of Alopecia Areata Information Obtained from a Popular Online Large Language Model, ChatGPT.从热门在线大型语言模型 ChatGPT 获取的斑秃信息的准确性和适宜性趋势。

Dermatology. 2023;239(6):952-957. doi: 10.1159/000534005. Epub 2023 Sep 18.

Assessing the accuracy and consistency of ChatGPT in clinical pharmacy management: A preliminary analysis with clinical pharmacy experts worldwide.评估ChatGPT在临床药学管理中的准确性和一致性：与全球临床药学专家的初步分析

Res Social Adm Pharm. 2023 Dec;19(12):1590-1594. doi: 10.1016/j.sapharm.2023.08.012. Epub 2023 Sep 7.

Unraveling the Ethical Enigma: Artificial Intelligence in Healthcare.解开伦理谜团：医疗保健领域的人工智能

Cureus. 2023 Aug 10;15(8):e43262. doi: 10.7759/cureus.43262. eCollection 2023 Aug.

Evaluating the performance of ChatGPT in clinical pharmacy: A comparative study of ChatGPT and clinical pharmacists.评估 ChatGPT 在临床药学中的性能：ChatGPT 与临床药师的对比研究。

Br J Clin Pharmacol. 2024 Jan;90(1):232-238. doi: 10.1111/bcp.15896. Epub 2023 Sep 13.

Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.比较分析 ChatGPT-3.5、ChatGPT-4.0 和谷歌巴德在近视防控方面的表现：大型语言模型的基准测试。

EBioMedicine. 2023 Sep;95:104770. doi: 10.1016/j.ebiom.2023.104770. Epub 2023 Aug 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能系统在提供药物剂量方面准确性的比较评估：一项方法学研究。

Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study.

作者信息

机构信息

出版信息

BACKGROUND

AIM

METHODS

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献