用于保护隐私的局部大型语言模型，可加速回顾历史超声心动图报告。

Local large language models for privacy-preserving accelerated review of historic echocardiogram reports.

机构信息

The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States.

The Division of Data Driven and Digital Medicine (D3M), Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States.

出版信息

J Am Med Inform Assoc. 2024 Sep 1;31(9):2097-2102. doi: 10.1093/jamia/ocae085.

DOI:10.1093/jamia/ocae085

PMID:38687616

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11339495/

Abstract

OBJECTIVES

The study developed framework that leverages an open-source Large Language Model (LLM) to enable clinicians to ask plain-language questions about a patient's entire echocardiogram report history. This approach is intended to streamline the extraction of clinical insights from multiple echocardiogram reports, particularly in patients with complex cardiac diseases, thereby enhancing both patient care and research efficiency.

MATERIALS AND METHODS

Data from over 10 years were collected, comprising echocardiogram reports from patients with more than 10 echocardiograms on file at the Mount Sinai Health System. These reports were converted into a single document per patient for analysis, broken down into snippets and relevant snippets were retrieved using text similarity measures. The LLaMA-2 70B model was employed for analyzing the text using a specially crafted prompt. The model's performance was evaluated against ground-truth answers created by faculty cardiologists.

RESULTS

The study analyzed 432 reports from 37 patients for a total of 100 question-answer pairs. The LLM correctly answered 90% questions, with accuracies of 83% for temporality, 93% for severity assessment, 84% for intervention identification, and 100% for diagnosis retrieval. Errors mainly stemmed from the LLM's inherent limitations, such as misinterpreting numbers or hallucinations.

CONCLUSION

The study demonstrates the feasibility and effectiveness of using a local, open-source LLM for querying and interpreting echocardiogram report data. This approach offers a significant improvement over traditional keyword-based searches, enabling more contextually relevant and semantically accurate responses; in turn showing promise in enhancing clinical decision-making and research by facilitating more efficient access to complex patient data.

摘要

目的

本研究开发了一个框架，利用开源的大型语言模型（LLM），使临床医生能够用通俗易懂的语言询问患者整个超声心动图报告历史的问题。这种方法旨在简化从多个超声心动图报告中提取临床见解，特别是在患有复杂心脏病的患者中，从而提高患者护理和研究效率。

材料和方法

收集了超过 10 年的数据，包括来自西奈山卫生系统的 10 多份超声心动图报告。这些报告被转换为每个患者的单个文档进行分析，分为片段，并使用文本相似性度量检索相关片段。使用专门设计的提示，使用 LLaMA-2 70B 模型分析文本。模型的性能是通过由教师心脏病专家创建的地面实况答案进行评估的。

结果

该研究分析了 37 名患者的 432 份报告，共 100 个问答对。LLM 正确回答了 90%的问题，时间性的准确率为 83%，严重程度评估的准确率为 93%，干预识别的准确率为 84%，诊断检索的准确率为 100%。错误主要源于 LLM 的固有局限性，例如误解数字或产生幻觉。

结论

该研究证明了使用本地、开源 LLM 查询和解释超声心动图报告数据的可行性和有效性。这种方法比传统的基于关键字的搜索有了显著的改进，提供了更具上下文相关性和语义准确性的响应；反过来，通过更有效地访问复杂的患者数据，有望提高临床决策和研究的效率。

相似文献

Local large language models for privacy-preserving accelerated review of historic echocardiogram reports.用于保护隐私的局部大型语言模型，可加速回顾历史超声心动图报告。

J Am Med Inform Assoc. 2024 Sep 1;31(9):2097-2102. doi: 10.1093/jamia/ocae085.

Using Large Language Models to Automate Data Extraction From Surgical Pathology Reports: Retrospective Cohort Study.使用大语言模型自动从外科病理报告中提取数据：回顾性队列研究。

JMIR Form Res. 2025 Apr 7;9:e64544. doi: 10.2196/64544.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy.基于关键词频率驱动的提示增强策略优化生物医学信息检索

BMC Bioinformatics. 2024 Aug 27;25(1):281. doi: 10.1186/s12859-024-05902-7.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较：评估研究。

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Automated structured data extraction from intraoperative echocardiography reports using large language models.使用大语言模型从术中超声心动图报告中自动提取结构化数据

Br J Anaesth. 2025 May;134(5):1308-1317. doi: 10.1016/j.bja.2025.01.028. Epub 2025 Mar 3.

Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation.用于从非结构化和半结构化电子健康记录中提取数据的大语言模型：多模型性能评估

BMJ Health Care Inform. 2025 Jan 19;32(1):e101139. doi: 10.1136/bmjhci-2024-101139.

Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同侪患者为非专业患者解读实验室检查结果的答案质量：评估研究

ArXiv. 2024 Jan 23:arXiv:2402.01693v1.

Enhancing systematic literature reviews with generative artificial intelligence: development, applications, and performance evaluation.利用生成式人工智能加强系统文献综述：开发、应用及性能评估

J Am Med Inform Assoc. 2025 Apr 1;32(4):616-625. doi: 10.1093/jamia/ocaf030.

Robust privacy amidst innovation with large language models through a critical assessment of the risks.通过对风险的批判性评估，在大语言模型创新中实现强大的隐私保护。

J Am Med Inform Assoc. 2025 May 1;32(5):885-892. doi: 10.1093/jamia/ocaf037.

引用本文的文献

Reliability of large language models for reviewing research with artificial intelligence in cardiac electrophysiology using the European Heart Rhythm Association artificial intelligence checklist.使用欧洲心律协会人工智能检查表，大型语言模型对心脏电生理领域人工智能辅助研究综述的可靠性。

Europace. 2025 Aug 4;27(8). doi: 10.1093/europace/euaf173.

Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.评估大语言模型在肩胛下肌上囊重建术前患者教育中的应用：Claude、GPT和Gemini的比较研究

JMIR Perioper Med. 2025 Jun 12;8:e70047. doi: 10.2196/70047.

Evaluating large language models in echocardiography reporting: opportunities and challenges.评估大型语言模型在超声心动图报告中的应用：机遇与挑战。

Eur Heart J Digit Health. 2025 Mar 31;6(3):326-339. doi: 10.1093/ehjdh/ztae086. eCollection 2025 May.

Artificial intelligence in traditional Chinese medicine: advances in multi-metabolite multi-target interaction modeling.人工智能在中医领域的应用：多代谢物多靶点相互作用建模的进展

Front Pharmacol. 2025 Apr 15;16:1541509. doi: 10.3389/fphar.2025.1541509. eCollection 2025.

[Structured reporting in otorhinolaryngology].[耳鼻咽喉科的结构化报告]

HNO. 2025 Mar 26. doi: 10.1007/s00106-025-01605-4.

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review.2023年以来电子健康记录中用于患者护理的生成式大语言模型：一项系统综述

medRxiv. 2024 Aug 19:2024.08.11.24311828. doi: 10.1101/2024.08.11.24311828.

Large language models in biomedicine and health: current research landscape and future directions.生物医学与健康领域的大语言模型：当前研究现状与未来方向

J Am Med Inform Assoc. 2024 Sep 1;31(9):1801-1811. doi: 10.1093/jamia/ocae202.

Transforming Echocardiography: The Role of Artificial Intelligence in Enhancing Diagnostic Accuracy and Accessibility.变革性超声心动图：人工智能在提高诊断准确性和可及性方面的作用。

Intern Med. 2025 Feb 1;64(3):331-336. doi: 10.2169/internalmedicine.4171-24. Epub 2024 Jul 25.

本文引用的文献

ChatGPT: friend or foe?ChatGPT：朋友还是敌人？

Lancet Digit Health. 2023 Mar;5(3):e102. doi: 10.1016/S2589-7500(23)00023-7. Epub 2023 Feb 6.

Artificial intelligence for the echocardiographic assessment of valvular heart disease.人工智能在心脏瓣膜病超声心动图评估中的应用。

Heart. 2022 Sep 26;108(20):1592-1599. doi: 10.1136/heartjnl-2021-319725.

Differences in echocardiography interpretation techniques among trainees and expert readers.超声心动图解读技术在受训者和专家读者之间的差异。

J Echocardiogr. 2021 Dec;19(4):222-231. doi: 10.1007/s12574-021-00531-y. Epub 2021 May 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验