• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

全科医生还是ChatGPT?大型语言模型在开具抗生素处方时支持全科医生的能力。

GP or ChatGPT? Ability of large language models (LLMs) to support general practitioners when prescribing antibiotics.

作者信息

Ngoc Nguyen Oanh, Amin Doaa, Bennett James, Hetlevik Øystein, Malik Sara, Tout Andrew, Vornhagen Heike, Vellinga Akke

机构信息

CARA Network, School of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland.

NIHR In Practice Fellow, Hull York Medical School, University of Hull, Hull HU6 7RX, UK.

出版信息

J Antimicrob Chemother. 2025 May 2;80(5):1324-1330. doi: 10.1093/jac/dkaf077.

DOI:10.1093/jac/dkaf077
PMID:40079276
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12046391/
Abstract

INTRODUCTION

Large language models (LLMs) are becoming ubiquitous and widely implemented. LLMs could also be used for diagnosis and treatment. National antibiotic prescribing guidelines are customized and informed by local laboratory data on antimicrobial resistance.

METHODS

Based on 24 vignettes with information on type of infection, gender, age group and comorbidities, GPs and LLMs were prompted to provide a treatment. Four countries (Ireland, UK, USA and Norway) were included and a GP from each country and six LLMs (ChatGPT, Gemini, Copilot, Mistral AI, Claude and Llama 3.1) were provided with the vignettes, including their location (country). Responses were compared with the country's national prescribing guidelines. In addition, limitations of LLMs such as hallucination, toxicity and data leakage were assessed.

RESULTS

GPs' answers to the vignettes showed high accuracy in relation to diagnosis (96%-100%) and yes/no antibiotic prescribing (83%-92%). GPs referenced (100%) and prescribed (58%-92%) according to national guidelines, but dose/duration of treatment was less accurate (50%-75%). Overall, the GPs' accuracy had a mean of 74%. LLMs scored high in relation to diagnosis (92%-100%), antibiotic prescribing (88%-100%) and the choice of antibiotic (59%-100%) but correct referencing often failed (38%-96%), in particular for the Norwegian guidelines (0%-13%). Data leakage was shown to be an issue as personal information was repeated in the models' responses to the vignettes.

CONCLUSIONS

LLMs may be safe to guide antibiotic prescribing in general practice. However, to interpret vignettes, apply national guidelines and prescribe the right dose and duration, GPs remain best placed.

摘要

引言

大语言模型(LLMs)正变得无处不在并得到广泛应用。大语言模型也可用于诊断和治疗。国家抗生素处方指南是根据当地关于抗菌药物耐药性的实验室数据制定并提供依据的。

方法

基于24个包含感染类型、性别、年龄组和合并症信息的病例 vignettes,促使全科医生(GPs)和大语言模型提供治疗方案。纳入了四个国家(爱尔兰、英国、美国和挪威),并向每个国家的一名全科医生和六个大语言模型(ChatGPT、Gemini、Copilot、Mistral AI、Claude 和 Llama 3.1)提供了病例 vignettes,包括其所在位置(国家)。将回答与该国的国家处方指南进行比较。此外,还评估了大语言模型的局限性,如幻觉、毒性和数据泄露。

结果

全科医生对病例 vignettes 的回答在诊断方面显示出较高的准确性(96%-100%)以及是否开具抗生素处方方面(83%-92%)。全科医生根据国家指南进行参考(100%)和开处方(58%-92%),但治疗剂量/疗程的准确性较低(50%-75%)。总体而言,全科医生的准确性平均为74%。大语言模型在诊断(92%-100%)、抗生素处方(88%-100%)和抗生素选择(59%-100%)方面得分较高,但正确参考往往失败(38%-96%),特别是对于挪威指南(0%-13%)。由于个人信息在模型对病例 vignettes 的回答中被重复,数据泄露被证明是一个问题。

结论

在一般实践中,大语言模型可能可以安全地指导抗生素处方。然而,对于解读病例 vignettes、应用国家指南以及开具正确的剂量和疗程,全科医生仍然是最合适的人选。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8543/12046391/27adc185d3e4/dkaf077f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8543/12046391/c592ffaba0df/dkaf077f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8543/12046391/27adc185d3e4/dkaf077f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8543/12046391/c592ffaba0df/dkaf077f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8543/12046391/27adc185d3e4/dkaf077f2.jpg

相似文献

1
GP or ChatGPT? Ability of large language models (LLMs) to support general practitioners when prescribing antibiotics.全科医生还是ChatGPT?大型语言模型在开具抗生素处方时支持全科医生的能力。
J Antimicrob Chemother. 2025 May 2;80(5):1324-1330. doi: 10.1093/jac/dkaf077.
2
"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?“AI 医生为您服务”:ChatGPT-4 的治疗建议与骨科临床实践指南如何契合?
Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.
3
Prevalence of Antibiotic Prescribing for Acute Respiratory Tract Infection in Telehealth Versus Face-to-Face Consultations: Cross-Sectional Analysis of General Practice Registrars' Clinical Practice.远程医疗与面对面咨询中急性呼吸道感染抗生素处方的患病率:全科医生注册学员临床实践的横断面分析
J Med Internet Res. 2025 Mar 13;27:e60831. doi: 10.2196/60831.
4
Interventions to improve antibiotic prescribing practices for hospital inpatients.改善医院住院患者抗生素处方行为的干预措施。
Cochrane Database Syst Rev. 2017 Feb 9;2(2):CD003543. doi: 10.1002/14651858.CD003543.pub4.
5
Written information for patients (or parents of child patients) to reduce the use of antibiotics for acute upper respiratory tract infections in primary care.给患者(或儿童患者的家长)的书面信息,以减少基层医疗中急性上呼吸道感染抗生素的使用。
Cochrane Database Syst Rev. 2016 Nov 25;11(11):CD011360. doi: 10.1002/14651858.CD011360.pub2.
6
Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.用于计算局部麻醉药最大安全剂量的3种对话式生成人工智能模型的性能:比较分析
JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.
7
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较:随机对照试验
JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.
8
Immediate versus delayed versus no antibiotics for respiratory infections.即刻与延迟用与不用抗生素治疗呼吸道感染。
Cochrane Database Syst Rev. 2023 Oct 4;10(10):CD004417. doi: 10.1002/14651858.CD004417.pub6.
9
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
10
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

引用本文的文献

1
Comparative analysis of the performance of the large language models DeepSeek-V3, DeepSeek-R1, open AI-O3 mini and open AI-O3 mini high in urology.大语言模型DeepSeek-V3、DeepSeek-R1、open AI-O3 mini和open AI-O3 mini在泌尿外科领域的性能比较分析。
World J Urol. 2025 Jul 7;43(1):416. doi: 10.1007/s00345-025-05757-4.

本文引用的文献

1
Comparative Evaluation of LLMs in Clinical Oncology.临床肿瘤学中大型语言模型的比较评估
NEJM AI. 2024 May;1(5). doi: 10.1056/aioa2300151. Epub 2024 Apr 16.
2
Detecting hallucinations in large language models using semantic entropy.使用语义熵检测大型语言模型中的幻觉。
Nature. 2024 Jun;630(8017):625-630. doi: 10.1038/s41586-024-07421-0. Epub 2024 Jun 19.
3
Appropriateness of intended antibiotic prescribing using clinical case vignettes in primary care, and related factors.基层医疗中使用临床病例简述评估抗生素应用适宜性及其相关因素。
Eur J Gen Pract. 2024 Dec;30(1):2351811. doi: 10.1080/13814788.2024.2351811. Epub 2024 May 20.
4
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
5
Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.大型语言模型防范生成健康类虚假信息的现行保障措施、风险缓解措施和透明度措施:重复横断面分析。
BMJ. 2024 Mar 20;384:e078538. doi: 10.1136/bmj-2023-078538.
6
Clinical decision support for bipolar depression using large language models.使用大型语言模型辅助双相情感障碍的临床决策。
Neuropsychopharmacology. 2024 Aug;49(9):1412-1416. doi: 10.1038/s41386-024-01841-2. Epub 2024 Mar 13.
7
A comparative vignette study: Evaluating the potential role of a generative AI model in enhancing clinical decision-making in nursing.一项比较性案例研究:评估生成式人工智能模型在增强护理临床决策中的潜在作用。
J Adv Nurs. 2024 Feb 17. doi: 10.1111/jan.16101.
8
Syndromic Antibiograms and Nursing Home Clinicians' Antibiotic Choices for Urinary Tract Infections.综合征抗生素谱分析与尿路感染中护理院临床医生的抗生素选择。
JAMA Netw Open. 2023 Dec 1;6(12):e2349544. doi: 10.1001/jamanetworkopen.2023.49544.
9
ChatGPT or LLMs can provide treatment suggestions for critical patients with antibiotic-resistant infections: a next-generation revolution for medical science?ChatGPT或大型语言模型能否为患有抗生素耐药性感染的重症患者提供治疗建议:医学科学的下一代革命?
Int J Surg. 2024 Mar 1;110(3):1829-1831. doi: 10.1097/JS9.0000000000000987.
10
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.