• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用大语言模型识别老年人的药物停用机会:回顾性队列研究。

Identifying Deprescribing Opportunities With Large Language Models in Older Adults: Retrospective Cohort Study.

作者信息

Socrates Vimig, Wright Donald S, Huang Thomas, Fereydooni Soraya, Dien Christine, Chi Ling, Albano Jesse, Patterson Brian, Sasidhar Kanaparthy Naga, Wright Catherine X, Loza Andrew, Chartash David, Iscoe Mark, Taylor Richard Andrew

机构信息

Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, United States.

Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States.

出版信息

JMIR Aging. 2025 Apr 11;8:e69504. doi: 10.2196/69504.

DOI:10.2196/69504
PMID:40215480
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12032504/
Abstract

BACKGROUND

Polypharmacy, the concurrent use of multiple medications, is prevalent among older adults and associated with increased risks for adverse drug events including falls. Deprescribing, the systematic process of discontinuing potentially inappropriate medications, aims to mitigate these risks. However, the practical application of deprescribing criteria in emergency settings remains limited due to time constraints and criteria complexity.

OBJECTIVE

This study aims to evaluate the performance of a large language model (LLM)-based pipeline in identifying deprescribing opportunities for older emergency department (ED) patients with polypharmacy, using 3 different sets of criteria: Beers, Screening Tool of Older People's Prescriptions, and Geriatric Emergency Medication Safety Recommendations. The study further evaluates LLM confidence calibration and its ability to improve recommendation performance.

METHODS

We conducted a retrospective cohort study of older adults presenting to an ED in a large academic medical center in the Northeast United States from January 2022 to March 2022. A random sample of 100 patients (712 total oral medications) was selected for detailed analysis. The LLM pipeline consisted of two steps: (1) filtering high-yield deprescribing criteria based on patients' medication lists, and (2) applying these criteria using both structured and unstructured patient data to recommend deprescribing. Model performance was assessed by comparing model recommendations to those of trained medical students, with discrepancies adjudicated by board-certified ED physicians. Selective prediction, a method that allows a model to abstain from low-confidence predictions to improve overall reliability, was applied to assess the model's confidence and decision-making thresholds.

RESULTS

The LLM was significantly more effective in identifying deprescribing criteria (positive predictive value: 0.83; negative predictive value: 0.93; McNemar test for paired proportions: χ=5.985; P=.02) relative to medical students, but showed limitations in making specific deprescribing recommendations (positive predictive value=0.47; negative predictive value=0.93). Adjudication revealed that while the model excelled at identifying when there was a deprescribing criterion related to one of the patient's medications, it often struggled with determining whether that criterion applied to the specific case due to complex inclusion and exclusion criteria (54.5% of errors) and ambiguous clinical contexts (eg, missing information; 39.3% of errors). Selective prediction only marginally improved LLM performance due to poorly calibrated confidence estimates.

CONCLUSIONS

This study highlights the potential of LLMs to support deprescribing decisions in the ED by effectively filtering relevant criteria. However, challenges remain in applying these criteria to complex clinical scenarios, as the LLM demonstrated poor performance on more intricate decision-making tasks, with its reported confidence often failing to align with its actual success in these cases. The findings underscore the need for clearer deprescribing guidelines, improved LLM calibration for real-world use, and better integration of human-artificial intelligence workflows to balance artificial intelligence recommendations with clinician judgment.

摘要

背景

多重用药,即同时使用多种药物,在老年人中很普遍,并且与包括跌倒在内的药物不良事件风险增加相关。减药,即停用潜在不适当药物的系统过程,旨在降低这些风险。然而,由于时间限制和标准复杂性,减药标准在急诊环境中的实际应用仍然有限。

目的

本研究旨在评估基于大语言模型(LLM)的流程在识别患有多重用药的老年急诊科(ED)患者的减药机会方面的性能,使用3套不同的标准:《Beers标准》、《老年人处方筛查工具》和《老年急诊用药安全建议》。该研究进一步评估了LLM的置信度校准及其改善推荐性能的能力。

方法

我们对2022年1月至2022年3月在美国东北部一家大型学术医疗中心急诊科就诊的老年人进行了一项回顾性队列研究。随机抽取100名患者(共712种口服药物)进行详细分析。LLM流程包括两个步骤:(1)根据患者的用药清单筛选高收益减药标准,(2)使用结构化和非结构化患者数据应用这些标准以推荐减药。通过将模型推荐与受过训练的医学生的推荐进行比较来评估模型性能,差异由获得董事会认证的急诊科医生裁定。应用选择性预测(一种允许模型放弃低置信度预测以提高整体可靠性的方法)来评估模型的置信度和决策阈值。

结果

相对于医学生,LLM在识别减药标准方面显著更有效(阳性预测值:0.83;阴性预测值:0.93;配对比例的McNemar检验:χ=5.985;P=.02),但在做出具体的减药推荐方面存在局限性(阳性预测值=0.47;阴性预测值=0.93)。裁定显示,虽然该模型在识别与患者的一种药物相关的减药标准时表现出色,但由于复杂的纳入和排除标准(54.5%的错误)以及模糊的临床背景(例如,信息缺失;39.3%的错误),它在确定该标准是否适用于特定病例时常常遇到困难。由于置信度估计校准不佳,选择性预测仅略微提高了LLM的性能。

结论

本研究强调了LLM通过有效筛选相关标准来支持急诊科减药决策的潜力。然而,将这些标准应用于复杂的临床场景仍存在挑战,因为LLM在更复杂的决策任务中表现不佳,其报告的置信度在这些情况下往往与其实际成功率不一致。研究结果强调需要更清晰的减药指南、改进LLM在实际应用中的校准,以及更好地整合人机智能工作流程,以平衡人工智能推荐与临床医生的判断。

相似文献

1
Identifying Deprescribing Opportunities With Large Language Models in Older Adults: Retrospective Cohort Study.利用大语言模型识别老年人的药物停用机会:回顾性队列研究。
JMIR Aging. 2025 Apr 11;8:e69504. doi: 10.2196/69504.
2
Pharmacist-led medication assessment and deprescribing intervention for older adults with cancer and polypharmacy: a pilot study.药师主导的药物评估和减少老年癌症和多种药物治疗患者用药的干预:一项试点研究。
Support Care Cancer. 2018 Dec;26(12):4105-4113. doi: 10.1007/s00520-018-4281-3. Epub 2018 Jun 4.
3
Using Deprescribing Practices and the Screening Tool of Older Persons' Potentially Inappropriate Prescriptions Criteria to Reduce Harm and Preventable Adverse Drug Events in Older Adults.使用减药实践和老年人潜在不适当处方标准筛选工具,以减少老年人的伤害和可预防的药物不良事件。
J Patient Saf. 2020 Sep;16(3S Suppl 1):S23-S35. doi: 10.1097/PTS.0000000000000747.
4
Emergency Department Programs to Support Medication Safety in Older Adults: A Systematic Review and Meta-Analysis.支持老年人用药安全的急诊科项目:系统评价与荟萃分析
JAMA Netw Open. 2025 Mar 3;8(3):e250814. doi: 10.1001/jamanetworkopen.2025.0814.
5
Implementation of a compulsory clinical pharmacist-led medication deprescribing intervention in high-risk seniors in the emergency department.在急诊科对高危老年人实施由临床药师主导的强制性减药干预措施。
Acad Emerg Med. 2023 Apr;30(4):410-419. doi: 10.1111/acem.14699. Epub 2023 Mar 22.
6
Characteristics of elderly patients with polypharmacy who refuse to participate in an in-hospital deprescribing intervention: a retrospective cross-sectional study.高龄多重用药患者拒绝参与院内药物精简干预的特征:一项回顾性横断面研究。
BMC Geriatr. 2018 Apr 17;18(1):96. doi: 10.1186/s12877-018-0788-1.
7
Combating Polypharmacy Through Deprescribing Potentially Inappropriate Medications.通过停用潜在不适当药物来对抗多重用药
J Gerontol Nurs. 2019 Jan 1;45(1):9-15. doi: 10.3928/00989134-20190102-01.
8
Clinical impact of medication review and deprescribing in older inpatients: A systematic review and meta-analysis.临床药师在老年住院患者中的药物重整和药物精简的影响:系统评价和荟萃分析。
J Am Geriatr Soc. 2024 Oct;72(10):3219-3238. doi: 10.1111/jgs.19035. Epub 2024 Jun 1.
9
Rationalizing prescription via deprescribing in oncology practice.通过肿瘤学实践中的停药合理化处方。
J Oncol Pharm Pract. 2023 Dec;29(8):2007-2013. doi: 10.1177/10781552231207839. Epub 2023 Oct 17.
10
Drug Prescribing: Polypharmacy and Deprescribing.药物处方:多药治疗和减药。
FP Essent. 2021 Sep;508:33-40.

本文引用的文献

1
PRISM: Patient Records Interpretation for Semantic clinical trial Matching system using large language models.PRISM:使用大语言模型的语义临床试验匹配系统的患者记录解读
NPJ Digit Med. 2024 Oct 28;7(1):305. doi: 10.1038/s41746-024-01274-7.
2
Large language model uncertainty proxies: discrimination and calibration for medical diagnosis and treatment.大语言模型不确定性代理:医学诊断与治疗中的辨别与校准
J Am Med Inform Assoc. 2025 Jan 1;32(1):139-149. doi: 10.1093/jamia/ocae254.
3
Utilizing Large Language Models for Enhanced Clinical Trial Matching: A Study on Automation in Patient Screening.
利用大语言模型加强临床试验匹配:患者筛选自动化研究
Cureus. 2024 May 10;16(5):e60044. doi: 10.7759/cureus.60044. eCollection 2024 May.
4
Distilling large language models for matching patients to clinical trials.提炼大型语言模型以实现患者与临床试验的匹配。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1953-1963. doi: 10.1093/jamia/ocae073.
5
Can large language models reason about medical questions?大型语言模型能对医学问题进行推理吗?
Patterns (N Y). 2024 Mar 1;5(3):100943. doi: 10.1016/j.patter.2024.100943. eCollection 2024 Mar 8.
6
Geriatric Emergency Medication Safety Recommendations (GEMS-Rx): Modified Delphi Development of a High-Risk Prescription List for Older Emergency Department Patients.老年急诊用药安全推荐(GEMS-Rx):为老年急诊科患者制定高风险处方清单的改良 Delphi 法开发。
Ann Emerg Med. 2024 Sep;84(3):274-284. doi: 10.1016/j.annemergmed.2024.01.033. Epub 2024 Mar 12.
7
Automated HEART score determination via ChatGPT: Honing a framework for iterative prompt development.通过ChatGPT自动确定HEART评分:完善迭代提示开发框架。
J Am Coll Emerg Physicians Open. 2024 Mar 13;5(2):e13133. doi: 10.1002/emp2.13133. eCollection 2024 Apr.
8
Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine.诊断推理提示揭示了医学中大型语言模型可解释性的潜力。
NPJ Digit Med. 2024 Jan 24;7(1):20. doi: 10.1038/s41746-024-01010-1.
9
Designing human-AI systems for complex settings: ideas from distributed, joint, and self-organising perspectives of sociotechnical systems and cognitive work analysis.为复杂环境设计人机系统:来自社会技术系统和认知工作分析的分布式、联合和自组织视角的观点。
Ergonomics. 2023 Nov;66(11):1669-1694. doi: 10.1080/00140139.2023.2281898. Epub 2024 Jan 2.
10
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.