ChatGPT-3.5评估孕期非处方药物使用的准确性和安全性：一项描述性比较研究。

Accuracy and Safety of ChatGPT-3.5 in Assessing Over-the-Counter Medication Use During Pregnancy: A Descriptive Comparative Study.

作者信息

Cornelison Bernadette, Axon David R, Abbott Bryan, Bishop Carter, Jebara Cindy, Kumar Anjali, Root Kristen A

机构信息

R. Ken Coit College of Pharmacy, University of Arizona, 1295 N. Martin Ave., Tucson, AZ 85721, USA.

出版信息

Pharmacy (Basel). 2025 Jul 30;13(4):104. doi: 10.3390/pharmacy13040104.

DOI:10.3390/pharmacy13040104

PMID:40863701

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12389367/

Abstract

As artificial intelligence (AI) becomes increasingly utilized to perform tasks requiring human intelligence, patients who are pregnant may turn to AI for advice on over-the-counter (OTC) medications. However, medications used in pregnancy may pose profound safety concerns limited by data availability. This study focuses on a chatbot's ability to accurately provide information regarding OTC medications as it relates to patients that are pregnant. A prospective, descriptive design was used to compare the responses generated by the Chat Generative Pre-Trained Transformer 3.5 (ChatGPT-3.5) to the information provided by UpToDate. Eighty-seven of the top pharmacist-recommended OTC drugs in the United States (U.S.) as identified by Pharmacy Times were assessed for safe use in pregnancy using ChatGPT-3.5. A piloted, standard prompt was input into ChatGPT-3.5, and the responses were recorded. Two groups independently rated the responses compared to UpToDate on their correctness, completeness, and safety using a 5-point Likert scale. After independent evaluations, the groups discussed the findings to reach a consensus, with a third independent investigator giving final ratings. For correctness, the median score was 5 (interquartile range [IQR]: 5-5). For completeness, the median score was 4 (IQR: 4-5). For safety, the median score was 5 (IQR: 5-5). Despite high overall scores, the safety errors in 9% of the evaluations ( = 8), including omissions that pose a risk of serious complications, currently renders the chatbot an unsafe standalone resource for this purpose.

摘要

随着人工智能（AI）越来越多地被用于执行需要人类智能的任务，怀孕的患者可能会向AI寻求关于非处方药（OTC）的用药建议。然而，孕期使用的药物可能存在严重的安全问题，且受数据可得性的限制。本研究聚焦于一个聊天机器人准确提供与孕期患者相关的非处方药信息的能力。采用前瞻性描述性设计，将聊天生成预训练变换器3.5（ChatGPT-3.5）生成的回答与UpToDate提供的信息进行比较。使用ChatGPT-3.5评估了《药学时代》确定的美国最受药剂师推荐的87种非处方药在孕期的安全使用情况。向ChatGPT-3.5输入一个经过试点的标准提示，并记录回答。两组独立人员使用5点李克特量表，将这些回答与UpToDate提供的信息在正确性、完整性和安全性方面进行评分。独立评估后，两组讨论结果以达成共识，由第三位独立调查员给出最终评分。在正确性方面，中位数分数为5（四分位间距[IQR]：5 - 5）。在完整性方面，中位数分数为4（IQR：4 - 5）。在安全性方面，中位数分数为5（IQR：5 - 5）。尽管总体得分较高，但9%的评估（n = 8）中存在安全错误，包括可能导致严重并发症风险的遗漏，目前这使得该聊天机器人在此用途上成为一个不安全的独立资源。