• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于针对患者发送最佳实践警报的大语言模型筛查工具:开发与验证

A Large Language Model Screening Tool to Target Patients for Best Practice Alerts: Development and Validation.

作者信息

Savage Thomas, Wang John, Shieh Lisa

机构信息

Division of Hospital Medicine, Department of Medicine, Stanford University, Palo Alto, CA, United States.

Divison of Gastroenterology and Hepatology, Department of Medicine, Stanford University, Palo Alto, CA, United States.

出版信息

JMIR Med Inform. 2023 Nov 27;11:e49886. doi: 10.2196/49886.

DOI:10.2196/49886
PMID:38010803
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10714262/
Abstract

BACKGROUND

Best Practice Alerts (BPAs) are alert messages to physicians in the electronic health record that are used to encourage appropriate use of health care resources. While these alerts are helpful in both improving care and reducing costs, BPAs are often broadly applied nonselectively across entire patient populations. The development of large language models (LLMs) provides an opportunity to selectively identify patients for BPAs.

OBJECTIVE

In this paper, we present an example case where an LLM screening tool is used to select patients appropriate for a BPA encouraging the prescription of deep vein thrombosis (DVT) anticoagulation prophylaxis. The artificial intelligence (AI) screening tool was developed to identify patients experiencing acute bleeding and exclude them from receiving a DVT prophylaxis BPA.

METHODS

Our AI screening tool used a BioMed-RoBERTa (Robustly Optimized Bidirectional Encoder Representations from Transformers Pretraining Approach; AllenAI) model to perform classification of physician notes, identifying patients without active bleeding and thus appropriate for a thromboembolism prophylaxis BPA. The BioMed-RoBERTa model was fine-tuned using 500 history and physical notes of patients from the MIMIC-III (Medical Information Mart for Intensive Care) database who were not prescribed anticoagulation. A development set of 300 MIMIC patient notes was used to determine the model's hyperparameters, and a separate test set of 300 patient notes was used to evaluate the screening tool.

RESULTS

Our MIMIC-III test set population of 300 patients included 72 patients with bleeding (ie, were not appropriate for a DVT prophylaxis BPA) and 228 without bleeding who were appropriate for a DVT prophylaxis BPA. The AI screening tool achieved impressive accuracy with a precision-recall area under the curve of 0.82 (95% CI 0.75-0.89) and a receiver operator curve area under the curve of 0.89 (95% CI 0.84-0.94). The screening tool reduced the number of patients who would trigger an alert by 20% (240 instead of 300 alerts) and increased alert applicability by 14.8% (218 [90.8%] positive alerts from 240 total alerts instead of 228 [76%] positive alerts from 300 total alerts), compared to nonselectively sending alerts for all patients.

CONCLUSIONS

These results show a proof of concept on how language models can be used as a screening tool for BPAs. We provide an example AI screening tool that uses a HIPAA (Health Insurance Portability and Accountability Act)-compliant BioMed-RoBERTa model deployed with minimal computing power. Larger models (eg, Generative Pre-trained Transformers-3, Generative Pre-trained Transformers-4, and Pathways Language Model) will exhibit superior performance but require data use agreements to be HIPAA compliant. We anticipate LLMs to revolutionize quality improvement in hospital medicine.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0035/10714262/8a6a92a796e0/medinform_v11i1e49886_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0035/10714262/126fc666c43a/medinform_v11i1e49886_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0035/10714262/8a6a92a796e0/medinform_v11i1e49886_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0035/10714262/126fc666c43a/medinform_v11i1e49886_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0035/10714262/8a6a92a796e0/medinform_v11i1e49886_fig2.jpg
摘要

背景

最佳实践警报(BPA)是电子健康记录中发送给医生的警报信息,用于鼓励合理使用医疗资源。虽然这些警报有助于改善医疗服务并降低成本,但BPA通常在整个患者群体中不加选择地广泛应用。大语言模型(LLM)的发展为有选择地识别适合BPA的患者提供了机会。

目的

在本文中,我们展示了一个案例,其中使用了一个LLM筛选工具来选择适合BPA的患者,该BPA鼓励开具深静脉血栓形成(DVT)抗凝预防药物。开发人工智能(AI)筛选工具是为了识别正在经历急性出血的患者,并将他们排除在接受DVT预防BPA之外。

方法

我们的AI筛选工具使用了BioMed-RoBERTa(来自Transformer预训练方法的稳健优化双向编码器表示;AllenAI)模型对医生记录进行分类,识别没有活动性出血的患者,因此适合接受血栓栓塞预防BPA。使用来自MIMIC-III(重症监护医学信息库)数据库的500份未开具抗凝药物的患者病史和体格检查记录对BioMed-RoBERTa模型进行了微调。使用300份MIMIC患者记录的开发集来确定模型的超参数,并使用300份患者记录的单独测试集来评估筛选工具。

结果

我们的300名患者的MIMIC-III测试集包括72名有出血情况的患者(即不适合DVT预防BPA)和228名无出血情况且适合DVT预防BPA的患者。AI筛选工具取得了令人印象深刻的准确率,曲线下精确召回面积为0.82(95%CI 0.75-0.89),曲线下接收者操作曲线面积为0.89(95%CI 0.84-0.94)。与对所有患者不加选择地发送警报相比,筛选工具将触发警报的患者数量减少了20%(240次警报而不是300次警报),并将警报适用性提高了14.8%(240次总警报中有218次[90.8%]阳性警报,而不是300次总警报中有228次[76%]阳性警报)。

结论

这些结果证明了语言模型如何用作BPA的筛选工具这一概念。我们提供了一个示例AI筛选工具,该工具使用符合《健康保险流通与责任法案》(HIPAA)的BioMed-RoBERTa模型,以最小的计算能力进行部署。更大的模型(例如,生成式预训练变换器-3、生成式预训练变换器-4和路径语言模型)将表现出卓越的性能,但需要数据使用协议符合HIPAA标准。我们预计LLM将彻底改变医院医学的质量改进。

相似文献

1
A Large Language Model Screening Tool to Target Patients for Best Practice Alerts: Development and Validation.一种用于针对患者发送最佳实践警报的大语言模型筛查工具:开发与验证
JMIR Med Inform. 2023 Nov 27;11:e49886. doi: 10.2196/49886.
2
On the development and validation of large language model-based classifiers for identifying social determinants of health.基于大语言模型的健康社会决定因素识别分类器的开发与验证
Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2320716121. doi: 10.1073/pnas.2320716121. Epub 2024 Sep 16.
3
Deployment of Real-time Natural Language Processing and Deep Learning Clinical Decision Support in the Electronic Health Record: Pipeline Implementation for an Opioid Misuse Screener in Hospitalized Adults.电子健康记录中实时自然语言处理和深度学习临床决策支持的应用:成年住院患者阿片类药物滥用筛查器的流程实施
JMIR Med Inform. 2023 Apr 20;11:e44977. doi: 10.2196/44977.
4
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN(带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合)模型的医患对话多标签分类:命名实体研究
JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.
5
A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.基于荷兰全科电子健康记录的 COVID-19 检测自然语言处理模型:使用转换器的双向编码器表示进行开发和验证研究。
J Med Internet Res. 2023 Oct 4;25:e49944. doi: 10.2196/49944.
6
Optimizing Best Practice Advisory alerts in electronic medical records with a multi-pronged strategy at a tertiary care hospital in Singapore.在新加坡一家三级护理医院采用多管齐下的策略优化电子病历中的最佳实践建议警报。
JAMIA Open. 2023 Aug 1;6(3):ooad056. doi: 10.1093/jamiaopen/ooad056. eCollection 2023 Oct.
7
Interventions for implementation of thromboprophylaxis in hospitalized patients at risk for venous thromboembolism.对有静脉血栓栓塞风险的住院患者实施血栓预防的干预措施。
Cochrane Database Syst Rev. 2018 Apr 24;4(4):CD008201. doi: 10.1002/14651858.CD008201.pub3.
8
Enhancing early detection of cognitive decline in the elderly: a comparative study utilizing large language models in clinical notes.提高老年人认知能力下降的早期检测:一项在临床记录中使用大语言模型的比较研究。
EBioMedicine. 2024 Nov;109:105401. doi: 10.1016/j.ebiom.2024.105401. Epub 2024 Oct 12.
9
Classification of Patients' Judgments of Their Physicians in Web-Based Written Reviews Using Natural Language Processing: Algorithm Development and Validation.使用自然语言处理对患者在基于网络的书面评论中对其医生的评价进行分类:算法开发与验证
J Med Internet Res. 2024 Aug 1;26:e50236. doi: 10.2196/50236.
10
A Large Language Model-Based Generative Natural Language Processing Framework Finetuned on Clinical Notes Accurately Extracts Headache Frequency from Electronic Health Records.一种基于大语言模型的生成式自然语言处理框架,在临床笔记上进行微调后,能准确从电子健康记录中提取头痛频率。
medRxiv. 2023 Oct 3:2023.10.02.23296403. doi: 10.1101/2023.10.02.23296403.

引用本文的文献

1
Evaluating large language models on hospital health data for automated emergency triage.基于医院健康数据评估大型语言模型以实现自动急诊分诊。
Int J Comput Assist Radiol Surg. 2025 Jul 16. doi: 10.1007/s11548-025-03475-1.
2
The applications of ChatGPT and other large language models in anesthesiology and critical care: a systematic review.ChatGPT及其他大语言模型在麻醉学与重症监护中的应用:一项系统综述
Can J Anaesth. 2025 Jun 16. doi: 10.1007/s12630-025-02973-9.
3
Areas of research focus and trends in the research on the application of AIGC in healthcare.

本文引用的文献

1
A multifaceted clinical decision support intervention to improve adherence to thromboprophylaxis guidelines.一种多方面的临床决策支持干预措施,以提高对血栓预防指南的依从性。
Int J Clin Pharm. 2021 Oct;43(5):1327-1336. doi: 10.1007/s11096-021-01254-x. Epub 2021 Mar 11.
2
Electronic alert system for improving appropriate thromboprophylaxis in hospitalized medical patients: a randomized controlled trial.电子警示系统改善住院内科患者恰当血栓预防的效果:一项随机对照试验
J Thromb Haemost. 2017 Nov;15(11):2138-2146. doi: 10.1111/jth.13812. Epub 2017 Sep 20.
3
Effects of workload, work complexity, and repeated alerts on alert fatigue in a clinical decision support system.
人工智能生成内容(AIGC)在医疗保健领域应用的研究重点领域和研究趋势。
J Health Popul Nutr. 2025 Jun 14;44(1):195. doi: 10.1186/s41043-025-00947-7.
4
Utilizing large language models for gastroenterology research: a conceptual framework.利用大语言模型进行胃肠病学研究:一个概念框架。
Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025.
5
Current applications and challenges in large language models for patient care: a systematic review.用于患者护理的大语言模型的当前应用与挑战:一项系统综述
Commun Med (Lond). 2025 Jan 21;5(1):26. doi: 10.1038/s43856-024-00717-2.
6
Twenty-Five Years of Evolution and Hurdles in Electronic Health Records and Interoperability in Medical Research: Comprehensive Review.电子健康记录在医学研究中的25年发展历程与障碍及互操作性:全面综述
J Med Internet Res. 2025 Jan 9;27:e59024. doi: 10.2196/59024.
7
Large Language Models in Gastroenterology: Systematic Review.胃肠病学中的大语言模型:系统评价
J Med Internet Res. 2024 Dec 20;26:e66648. doi: 10.2196/66648.
8
Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review.大语言模型在医疗保健应用中的测试与评估:一项系统综述。
JAMA. 2025 Jan 28;333(4):319-328. doi: 10.1001/jama.2024.21700.
9
Revolutionizing gastrointestinal endoscopy: the emerging role of large language models.变革胃肠内镜检查:大语言模型的新兴作用
Clin Endosc. 2024 Nov;57(6):759-762. doi: 10.5946/ce.2024.039. Epub 2024 Aug 29.
10
Accuracy Evaluation of GPT-Assisted Differential Diagnosis in Emergency Department.急诊科GPT辅助鉴别诊断的准确性评估
Diagnostics (Basel). 2024 Aug 15;14(16):1779. doi: 10.3390/diagnostics14161779.
临床决策支持系统中工作量、工作复杂性及重复警报对警报疲劳的影响。
BMC Med Inform Decis Mak. 2017 Apr 10;17(1):36. doi: 10.1186/s12911-017-0430-8.
4
MIMIC-III, a freely accessible critical care database.MIMIC-III,一个免费获取的重症监护数据库。
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.
5
Associations between healthcare quality and use of electronic health record functions in ambulatory care.门诊医疗中医疗质量与电子健康记录功能使用之间的关联。
J Am Med Inform Assoc. 2015 Jul;22(4):864-71. doi: 10.1093/jamia/ocv030. Epub 2015 Apr 20.
6
Joint commission warns of alarm fatigue: multitude of alarms from monitoring devices problematic.联合委员会警告警报疲劳:监测设备发出的大量警报存在问题。
JAMA. 2013 Jun 12;309(22):2315-6. doi: 10.1001/jama.2013.6032.
7
Overriding of drug safety alerts in computerized physician order entry.计算机化医生医嘱录入系统中药物安全警报的忽略
J Am Med Inform Assoc. 2006 Mar-Apr;13(2):138-47. doi: 10.1197/jamia.M1809. Epub 2005 Dec 15.
8
PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.生理信号库、生理信号处理工具包和生理信号网络:复杂生理信号新研究资源的组成部分。
Circulation. 2000 Jun 13;101(23):E215-20. doi: 10.1161/01.cir.101.23.e215.