用于更高效报告医院质量指标的大语言模型

Large Language Models for More Efficient Reporting of Hospital Quality Measures.

作者信息

Boussina Aaron, Krishnamoorthy Rishivardhan, Quintero Kimberly, Joshi Shreyansh, Wardi Gabriel, Pour Hayden, Hilbert Nicholas, Malhotra Atul, Hogarth Michael, Sitapati Amy M, VanDenBerg Chad, Singh Karandeep, Longhurst Christopher A, Nemati Shamim

机构信息

Division of Biomedical Informatics, University of California, San Diego, San Diego.

Department of Quality, University of California, San Diego, San Diego.

出版信息

NEJM AI. 2024 Oct 24;1(11). doi: 10.1056/aics2400420. Epub 2024 Oct 21.

DOI:10.1056/aics2400420

PMID:39703686

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11658346/

Abstract

Hospital quality measures are a vital component of a learning health system, yet they can be costly to report, statistically underpowered, and inconsistent due to poor interrater reliability. Large language models (LLMs) have recently demonstrated impressive performance on health care-related tasks and offer a promising way to provide accurate abstraction of complete charts at scale. To evaluate this approach, we deployed an LLM-based system that ingests Fast Healthcare Interoperability Resources data and outputs a completed Severe Sepsis and Septic Shock Management Bundle (SEP-1) abstraction. We tested the system on a sample of 100 manual SEP-1 abstractions that University of California San Diego Health reported to the Centers for Medicare & Medicaid Services in 2022. The LLM system achieved agreement with manual abstractors on the measure category assignment in 90 of the abstractions (90%; κ=0.82; 95% confidence interval, 0.71 to 0.92). Expert review of the 10 discordant cases identified four that were mistakes introduced by manual abstraction. This pilot study suggests that LLMs using interoperable electronic health record data may perform accurate abstractions for complex quality measures. (Funded by the National Institute of Allergy and Infectious Diseases [1R42AI177108-1] and others.).

摘要

医院质量指标是学习型医疗系统的重要组成部分，但报告这些指标可能成本高昂、统计效力不足，且由于评分者间信度差而不一致。大型语言模型（LLMs）最近在医疗相关任务中表现出令人印象深刻的性能，并提供了一种有前景的方法来大规模准确提炼完整病历。为评估这种方法，我们部署了一个基于大型语言模型的系统，该系统摄取快速医疗保健互操作性资源数据，并输出一份完整的严重脓毒症和脓毒性休克管理集束（SEP-1）提炼结果。我们在加利福尼亚大学圣地亚哥分校医疗中心于2022年向医疗保险和医疗补助服务中心报告的100份手动SEP-1提炼样本上测试了该系统。大型语言模型系统在90份提炼结果（90%；κ=0.82；95%置信区间，0.71至0.92）的指标类别分配上与手动提炼者达成了一致。对10例不一致病例的专家审查发现，其中4例是手动提炼引入的错误。这项初步研究表明，使用可互操作电子健康记录数据的大型语言模型可能对复杂质量指标进行准确提炼。（由美国国立过敏与传染病研究所[1R42AI177108-1]等资助。）

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3db6/11658346/00794066df75/nihms-2031638-f0001.jpg

相似文献

Large Language Models for More Efficient Reporting of Hospital Quality Measures.用于更高效报告医院质量指标的大语言模型

NEJM AI. 2024 Oct 24;1(11). doi: 10.1056/aics2400420. Epub 2024 Oct 21.

Inter-rater Agreement for Abstraction of the Early Management Bundle, Severe Sepsis/Septic Shock (SEP-1) Quality Measure in a Multi-Hospital Health System.多医院卫生系统中早期管理集束化、严重脓毒症/脓毒性休克（SEP-1）质量指标提取的评分者间一致性

Jt Comm J Qual Patient Saf. 2019 Feb;45(2):108-111. doi: 10.1016/j.jcjq.2018.10.002. Epub 2018 Nov 30.

Evidence Underpinning the Centers for Medicare & Medicaid Services' Severe Sepsis and Septic Shock Management Bundle (SEP-1): A Systematic Review.医疗保险和医疗补助服务中心严重脓毒症和脓毒性休克管理捆绑包（SEP-1）的证据基础：系统评价。

Ann Intern Med. 2018 Apr 17;168(8):558-568. doi: 10.7326/M17-2947. Epub 2018 Feb 20.

New Mandated Centers for Medicare and Medicaid Services Requirements for Sepsis Reporting: Caution from the Field.医疗保险和医疗补助服务中心关于脓毒症报告的新强制要求：来自实际工作的警示

J Emerg Med. 2017 Jan;52(1):109-116. doi: 10.1016/j.jemermed.2016.08.009. Epub 2016 Oct 5.

National Performance on the Medicare SEP-1 Sepsis Quality Measure.国家在 Medicare SEP-1 脓毒症质量测量上的表现。

Crit Care Med. 2019 Aug;47(8):1026-1032. doi: 10.1097/CCM.0000000000003613.

Infectious Diseases Society of America Position Paper: Recommended Revisions to the National Severe Sepsis and Septic Shock Early Management Bundle (SEP-1) Sepsis Quality Measure.美国传染病学会立场文件：国家严重脓毒症和脓毒性休克早期管理捆绑包（SEP-1）脓毒症质量测量推荐修订版。

Clin Infect Dis. 2021 Feb 16;72(4):541-552. doi: 10.1093/cid/ciaa059.

Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测：一项专家比较及未来方向的试点研究

Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Nurse Influence in Meeting Compliance With the Centers for Medicare and Medicaid Services Quality Measure: Early Management Bundle, Severe Sepsis/Septic Shock (SEP-1).护士在促使符合医疗保险和医疗补助服务中心质量指标方面的影响：早期管理综合方案，严重脓毒症/脓毒性休克（SEP-1）

Dimens Crit Care Nurs. 2019 Mar/Apr;38(2):70-82. doi: 10.1097/DCC.0000000000000340.

Preliminary Performance on the New CMS Sepsis-1 National Quality Measure: Early Insights From the Emergency Quality Network (E-QUAL).新型 CMS 脓毒症-1 国家质量指标的初步表现：来自急诊质量网络（E-QUAL）的早期见解。

Ann Emerg Med. 2018 Jan;71(1):10-15.e1. doi: 10.1016/j.annemergmed.2017.06.032. Epub 2017 Aug 5.

引用本文的文献

An eyecare foundation model for clinical assistance: a randomized controlled trial.一种用于临床辅助的眼保健基础模型：一项随机对照试验。

Nat Med. 2025 Aug 28. doi: 10.1038/s41591-025-03900-7.

Artificial Intelligence (AI) and Emergency Medicine: Balancing Opportunities and Challenges.人工智能与急诊医学：机遇与挑战的平衡

JMIR Med Inform. 2025 Aug 13;13:e70903. doi: 10.2196/70903.

Forecasting from Clinical Textual Time Series: Adaptations of the Encoder and Decoder Language Model Families.临床文本时间序列预测：编码器和解码器语言模型家族的适应性调整

ArXiv. 2025 Apr 20:arXiv:2504.10340v2.

SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models.SHREC：一个利用大语言模型推进下一代计算表型分析的框架。

ArXiv. 2025 Jul 17:arXiv:2506.16359v3.

Can we predict the future of respiratory failure prediction?我们能否预测呼吸衰竭预测的未来？

Crit Care. 2025 Jun 19;29(1):253. doi: 10.1186/s13054-025-05484-7.

Foundation models and intelligent decision-making: Progress, challenges, and perspectives.基础模型与智能决策：进展、挑战与展望

Innovation (Camb). 2025 May 12;6(6):100948. doi: 10.1016/j.xinn.2025.100948. eCollection 2025 Jun 2.

Understanding contraceptive switching rationales from real world clinical notes using large language models.使用大语言模型从真实世界临床记录中理解避孕方法转换的基本原理。

NPJ Digit Med. 2025 Apr 23;8(1):221. doi: 10.1038/s41746-025-01615-0.

Benchmark evaluation of DeepSeek large language models in clinical decision-making.临床决策中DeepSeek大语言模型的基准评估。

Nat Med. 2025 Apr 23. doi: 10.1038/s41591-025-03727-2.

"The Machine Will See You Now": A Clinician's Perspective on Artificial "Intelligence" In Clinical Care.“机器现在将为您服务”：临床医生对临床护理中人工智能的看法。

Mov Disord Clin Pract. 2025 May;12(5):588-591. doi: 10.1002/mdc3.70054. Epub 2025 Mar 20.

Leveraging artificial intelligence to reduce diagnostic errors in emergency medicine: Challenges, opportunities, and future directions.利用人工智能减少急诊医学中的诊断错误：挑战、机遇与未来方向。

Acad Emerg Med. 2025 Mar;32(3):327-339. doi: 10.1111/acem.15066. Epub 2024 Dec 15.

本文引用的文献

Development & Deployment of a Real-time Healthcare Predictive Analytics Platform.实时医疗保健预测分析平台的开发与部署。

Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1-4. doi: 10.1109/EMBC40787.2023.10340351.

Improving Sepsis Outcomes in the Era of Pay-for-Performance and Electronic Quality Measures: A Joint IDSA/ACEP/PIDS/SHEA/SHM/SIDP Position Paper.在按绩效付费和电子质量措施时代提高脓毒症治疗效果：IDSA/ACEP/PIDS/SHEA/SHM/SIDP 联合立场文件。

Clin Infect Dis. 2024 Mar 20;78(3):505-513. doi: 10.1093/cid/ciad447.

Large language models encode clinical knowledge.大语言模型编码临床知识。

Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

Health system-scale language models are all-purpose prediction engines.健康系统规模的语言模型是通用的预测引擎。

Nature. 2023 Jul;619(7969):357-362. doi: 10.1038/s41586-023-06160-y. Epub 2023 Jun 7.

The Volume and Cost of Quality Metric Reporting.质量指标报告的数量和成本。

JAMA. 2023 Jun 6;329(21):1840-1847. doi: 10.1001/jama.2023.7271.

Reassessing Quality Assessment - The Flawed System for Fixing a Flawed System.重新评估质量评估——修复有缺陷系统的有缺陷体系。

N Engl J Med. 2022 Apr 28;386(17):1663-1667. doi: 10.1056/NEJMms2200976. Epub 2022 Apr 13.

SEP-1-Taking the Measure of a Measure.SEP-1-衡量一项衡量标准

JAMA Netw Open. 2021 Dec 1;4(12):e2138823. doi: 10.1001/jamanetworkopen.2021.38823.

The State of Health Care Quality Measurement in the Era of COVID-19: The Importance of Doing Better.新冠疫情时代的医疗保健质量测量状况：做得更好的重要性。

JAMA. 2020 Jul 28;324(4):333-334. doi: 10.1001/jama.2020.11461.

Challenges to electronic clinical quality measurement using third-party platforms in primary care practices: the healthy hearts in the heartland experience.基层医疗实践中使用第三方平台进行电子临床质量测量面临的挑战：美国中西部地区“健康心脏”项目的经验

JAMIA Open. 2019 Sep 20;2(4):423-428. doi: 10.1093/jamiaopen/ooz038. eCollection 2019 Dec.

SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0：Python 中的科学计算基础算法。

Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验