• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大语言模型生成2型糖尿病管理计划的性能和安全性:一项使用真实患者记录与医生进行的对比研究。

Evaluating the Performance and Safety of Large Language Models in Generating Type 2 Diabetes Mellitus Management Plans: A Comparative Study With Physicians Using Real Patient Records.

作者信息

Mondal Agnibho, Naskar Arindam, Roy Choudhury Bhaskar, Chakraborty Sambudhya, Biswas Tanmay, Sinha Sumanta, Roy Sasmit

机构信息

Department of Infectious Diseases and Advanced Microbiology, School of Tropical Medicine, Kolkata, IND.

Department of Endocrinology, Nutrition and Metabolic Diseases, School of Tropical Medicine, Kolkata, IND.

出版信息

Cureus. 2025 Mar 17;17(3):e80737. doi: 10.7759/cureus.80737. eCollection 2025 Mar.

DOI:10.7759/cureus.80737
PMID:40248538
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12003111/
Abstract

Background The integration of large language models (LLMs) such as GPT-4 into healthcare presents potential benefits and challenges. While LLMs show promise in applications ranging from scientific writing to personalized medicine, their practical utility and safety in clinical settings remain under scrutiny. Concerns about accuracy, ethical considerations, and bias necessitate rigorous evaluation of these technologies against established medical standards. Methods This study involved a comparative analysis using anonymized patient records from a healthcare setting in the state of West Bengal, India. Management plans for 50 patients with type 2 diabetes mellitus were generated by GPT-4 and three physicians, who were blinded to each other's responses. These plans were evaluated against a reference management plan based on American Diabetes Society guidelines. Completeness, necessity, and dosage accuracy were quantified and a Prescribing Error Score was devised to assess the quality of the generated management plans. The safety of the management plans generated by GPT-4 was also assessed. Results Results indicated that physicians' management plans had fewer missing medications compared to those generated by GPT-4 (p=0.008). However, GPT-4-generated management plans included fewer unnecessary medications (p=0.003). No significant difference was observed in the accuracy of drug dosages (p=0.975). The overall error scores were comparable between physicians and GPT-4 (p=0.301). Safety issues were noted in 16% of the plans generated by GPT-4, highlighting potential risks associated with AI-generated management plans. Conclusion The study demonstrates that while GPT-4 can effectively reduce unnecessary drug prescriptions, it does not yet match the performance of physicians in terms of plan completeness. The findings support the use of LLMs as supplementary tools in healthcare, highlighting the need for enhanced algorithms and continuous human oversight to ensure the efficacy and safety of artificial intelligence in clinical settings.

摘要

背景 将GPT-4等大语言模型(LLMs)整合到医疗保健领域既带来了潜在益处,也带来了挑战。虽然大语言模型在从科学写作到个性化医疗等一系列应用中展现出了前景,但其在临床环境中的实际效用和安全性仍在接受审查。对准确性、伦理考量和偏差的担忧使得必须根据既定的医学标准对这些技术进行严格评估。方法 本研究采用来自印度西孟加拉邦一家医疗机构的匿名患者记录进行比较分析。GPT-4和三位医生分别生成了50例2型糖尿病患者的管理计划,他们彼此不知道对方的回复。这些计划根据美国糖尿病协会指南与一份参考管理计划进行评估。对完整性、必要性和剂量准确性进行了量化,并设计了一个处方错误评分来评估生成的管理计划的质量。还评估了GPT-4生成的管理计划的安全性。结果 结果表明,与GPT-4生成的管理计划相比,医生的管理计划中遗漏的药物较少(p = 0.008)。然而,GPT-4生成的管理计划中不必要的药物较少(p = 0.003)。在药物剂量准确性方面未观察到显著差异(p = 0.975)。医生和GPT-4的总体错误评分相当(p = 0.301)。在GPT-4生成的计划中有16%被指出存在安全问题,凸显了与人工智能生成的管理计划相关的潜在风险。结论 该研究表明,虽然GPT-4可以有效减少不必要的药物处方,但在计划完整性方面尚未达到医生的表现。这些发现支持将大语言模型用作医疗保健中的辅助工具,强调需要改进算法并持续进行人工监督,以确保人工智能在临床环境中的有效性和安全性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f67/12003111/5af288372e07/cureus-0017-00000080737-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f67/12003111/8cafd50e35b9/cureus-0017-00000080737-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f67/12003111/5af288372e07/cureus-0017-00000080737-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f67/12003111/8cafd50e35b9/cureus-0017-00000080737-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f67/12003111/5af288372e07/cureus-0017-00000080737-i02.jpg

相似文献

1
Evaluating the Performance and Safety of Large Language Models in Generating Type 2 Diabetes Mellitus Management Plans: A Comparative Study With Physicians Using Real Patient Records.评估大语言模型生成2型糖尿病管理计划的性能和安全性:一项使用真实患者记录与医生进行的对比研究。
Cureus. 2025 Mar 17;17(3):e80737. doi: 10.7759/cureus.80737. eCollection 2025 Mar.
2
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.
3
Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.大型语言模型在 3 个临床专业领域的治疗推荐中的应用:比较研究。
J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.
4
Comparing Artificial Intelligence-Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study.比较人工智能生成与临床医生创建的针对膝骨关节炎患者的个性化自我管理指导:盲法观察研究。
J Med Internet Res. 2025 May 7;27:e67830. doi: 10.2196/67830.
5
Vignette-based comparative analysis of ChatGPT and specialist treatment decisions for rheumatic patients: results of the Rheum2Guide study.基于病例的 ChatGPT 与风湿病专家治疗决策的比较分析:Rheum2Guide 研究结果。
Rheumatol Int. 2024 Oct;44(10):2043-2053. doi: 10.1007/s00296-024-05675-5. Epub 2024 Aug 10.
6
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
7
Comparison of CT referral justification using clinical decision support and large language models in a large European cohort.在一个大型欧洲队列中使用临床决策支持和大语言模型对CT转诊理由的比较。
Eur Radiol. 2025 Apr 27. doi: 10.1007/s00330-025-11608-y.
8
Using large language models as decision support tools in emergency ophthalmology.在急诊眼科中将大语言模型用作决策支持工具。
Int J Med Inform. 2025 Jul;199:105886. doi: 10.1016/j.ijmedinf.2025.105886. Epub 2025 Mar 22.
9
Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study.人工智能模型GPT-4和GPT-3.5在运动外科和物理治疗临床决策中的比较评估:一项横断面研究。
BMC Med Inform Decis Mak. 2025 Apr 14;25(1):163. doi: 10.1186/s12911-025-02996-8.
10
Utilizing large language models for gastroenterology research: a conceptual framework.利用大语言模型进行胃肠病学研究:一个概念框架。
Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025.

本文引用的文献

1
Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review.大语言模型在医疗保健应用中的测试与评估:一项系统综述。
JAMA. 2025 Jan 28;333(4):319-328. doi: 10.1001/jama.2024.21700.
2
Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Glucose-Lowering Medication Under Conditions of Clinical Uncertainty.在临床不确定性条件下,大型语言模型GPT-4与内分泌学家关于降糖药物初始选择的反应比较。
Diabetes Care. 2025 Feb 1;48(2):185-192. doi: 10.2337/dc24-1067.
3
Comparative Evaluation of LLMs in Clinical Oncology.
临床肿瘤学中大型语言模型的比较评估
NEJM AI. 2024 May;1(5). doi: 10.1056/aioa2300151. Epub 2024 Apr 16.
4
Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study.大语言模型在眼科领域接近专家级临床知识和推理能力:一项直接比较的横断面研究。
PLOS Digit Health. 2024 Apr 17;3(4):e0000341. doi: 10.1371/journal.pdig.0000341. eCollection 2024 Apr.
5
Evaluating large language models as agents in the clinic.评估大型语言模型作为临床中的智能体。
NPJ Digit Med. 2024 Apr 3;7(1):84. doi: 10.1038/s41746-024-01083-y.
6
Assessing the research landscape and clinical utility of large language models: a scoping review.评估大型语言模型的研究现状和临床实用性:范围综述。
BMC Med Inform Decis Mak. 2024 Mar 12;24(1):72. doi: 10.1186/s12911-024-02459-6.
7
A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports.GPT-4 在提供 MRI 报告中的骨科治疗建议方面的功效的初步研究。
Sci Rep. 2023 Nov 17;13(1):20159. doi: 10.1038/s41598-023-47500-2.
8
Advances in the management of type 2 diabetes in adults.成人2型糖尿病管理的进展
BMJ Med. 2023 Sep 4;2(1):e000372. doi: 10.1136/bmjmed-2022-000372. eCollection 2023.
9
Literature Review of Type 2 Diabetes Management and Health Literacy.2型糖尿病管理与健康素养的文献综述
Diabetes Spectr. 2021 Nov;34(4):399-406. doi: 10.2337/ds21-0014. Epub 2021 Jul 27.
10
Leveraging Artificial Intelligence to Improve Chronic Disease Care: Methods and Application to Pharmacotherapy Decision Support for Type-2 Diabetes Mellitus.利用人工智能改善慢性病护理:方法及在 2 型糖尿病药物治疗决策支持中的应用。
Methods Inf Med. 2021 Jun;60(S 01):e32-e43. doi: 10.1055/s-0041-1728757. Epub 2021 May 11.