在预测小儿体外循环术后急性肾损伤方面，生成式人工智能优于临床专家。

GenAI exceeds clinical experts in predicting acute kidney injury following paediatric cardiopulmonary bypass.

作者信息

Sharabiani Mansour, Mahani Alireza, Bottle Alex, Srinivasan Yadav, Issitt Richard, Stoica Serban

机构信息

School of Public Health, Imperial College London, London, UK.

New York Stock Exchange, New York, United States.

出版信息

Sci Rep. 2025 Jul 1;15(1):20847. doi: 10.1038/s41598-025-04651-8.

DOI:10.1038/s41598-025-04651-8

PMID:40593923

Abstract

The emergence of large language models (LLMs) opens new horizons to leverage, often unused, information in clinical text. Our study aims to capitalise on this new potential. Specifically, we examine the utility of text embeddings generated by LLMs in predicting postoperative acute kidney injury (AKI) in paediatric cardiopulmonary bypass (CPB) patients using electronic health record (EHR) text, and propose methods for explaining their output. AKI could be a serious complication in paediatric CPB and its accurate prediction can significantly improve patient outcomes by enabling timely interventions. We evaluate various text embedding algorithms such as Doc2Vec, top-performing sentence transformers on Hugging Face, and commercial LLMs from Google and OpenAI. We benchmark the cross-validated performance of these 'AI models' against a 'baseline model' as well as an established clinically-defined 'expert model'. The baseline model includes structured features, i.e., patient gender, age, height, body mass index and length of operation. The majority of AI models surpass, not only the baseline model, but also the expert model. An ensemble of AI and clinical-expert models improves discriminative performance by 23% compared to the baseline model. Consistency of patient clusters formed from AI-generated embeddings with clinical-expert clusters-measured via the adjusted rand index and adjusted mutual information metrics-illustrates the medical validity of LLM embeddings. We create a reverse mapping from the numeric embedding space to the natural-language domain via the embedding-based clusters, generating medical labels for the clusters in the process. We also use text-generating LLMs to summarise the differences between AI and expert clusters. Such 'explainability' outputs can increase medical practitioners' trust in the AI applications, and help generate new hypotheses, e.g., by studying the association of cluster memberships and outcomes of interest.

摘要

大语言模型（LLMs）的出现为利用临床文本中通常未被使用的信息开辟了新的视野。我们的研究旨在利用这一新潜力。具体而言，我们使用电子健康记录（EHR）文本，研究大语言模型生成的文本嵌入在预测小儿体外循环（CPB）患者术后急性肾损伤（AKI）方面的效用，并提出解释其输出的方法。AKI可能是小儿CPB中的一种严重并发症，其准确预测可通过及时干预显著改善患者预后。我们评估了各种文本嵌入算法，如Doc2Vec、Hugging Face上表现最佳的句子变换器，以及谷歌和OpenAI的商业大语言模型。我们将这些“人工智能模型”的交叉验证性能与“基线模型”以及既定的临床定义“专家模型”进行基准比较。基线模型包括结构化特征，即患者性别、年龄、身高、体重指数和手术时长。大多数人工智能模型不仅超过了基线模型，还超过了专家模型。与基线模型相比，人工智能和临床专家模型的组合将判别性能提高了23%。通过调整兰德指数和调整互信息指标衡量，由人工智能生成的嵌入形成的患者聚类与临床专家聚类的一致性说明了大语言模型嵌入的医学有效性。我们通过基于嵌入的聚类创建从数字嵌入空间到自然语言领域的反向映射，在此过程中为聚类生成医学标签。我们还使用文本生成大语言模型来总结人工智能聚类和专家聚类之间的差异。这种“可解释性”输出可以增加医学从业者对人工智能应用的信任，并有助于生成新的假设，例如通过研究聚类成员与感兴趣的结果之间的关联。

相似文献

GenAI exceeds clinical experts in predicting acute kidney injury following paediatric cardiopulmonary bypass.

Sci Rep. 2025 Jul 1;15(1):20847. doi: 10.1038/s41598-025-04651-8.

Using Generative Artificial Intelligence in Health Economics and Outcomes Research: A Primer on Techniques and Breakthroughs.

Pharmacoecon Open. 2025 Apr 29. doi: 10.1007/s41669-025-00580-4.

Algorithmic Classification of Psychiatric Disorder-Related Spontaneous Communication Using Large Language Model Embeddings: Algorithm Development and Validation.

JMIR AI. 2025 May 30;4:e67369. doi: 10.2196/67369.

Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study.

J Med Internet Res. 2025 Jul 2;27:e65226. doi: 10.2196/65226.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Technological aids for the rehabilitation of memory and executive functioning in children and adolescents with acquired brain injury.

Cochrane Database Syst Rev. 2016 Jul 1;7(7):CD011020. doi: 10.1002/14651858.CD011020.pub2.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Computer and mobile technology interventions for self-management in chronic obstructive pulmonary disease.

Cochrane Database Syst Rev. 2017 May 23;5(5):CD011425. doi: 10.1002/14651858.CD011425.pub2.

Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study.

J Med Internet Res. 2025 Jun 3;27:e75052. doi: 10.2196/75052.

本文引用的文献

The foundational capabilities of large language models in predicting postoperative risks using clinical notes.

NPJ Digit Med. 2025 Feb 11;8(1):95. doi: 10.1038/s41746-025-01489-2.

Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review.

J Am Med Inform Assoc. 2022 Jan 29;29(3):559-575. doi: 10.1093/jamia/ocab236.

Risk factors for acute kidney injury after pediatric cardiac surgery: a meta-analysis.

Pediatr Nephrol. 2022 Mar;37(3):509-519. doi: 10.1007/s00467-021-05297-0. Epub 2021 Sep 30.

A survey of word embeddings for clinical text.

J Biomed Inform. 2019;100S:100057. doi: 10.1016/j.yjbinx.2019.100057. Epub 2019 Oct 28.

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.

NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.

The role of the neutrophil-lymphocyte ratio for pre-operative risk stratification of acute kidney injury after tetralogy of Fallot repair.

Cardiol Young. 2021 Jun;31(6):1009-1014. doi: 10.1017/S1047951121001943. Epub 2021 May 21.

Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review.

J Biomed Inform. 2021 Mar;115:103671. doi: 10.1016/j.jbi.2020.103671. Epub 2020 Dec 31.

Early Prediction of Acute Kidney Injury in Critical Care Setting Using Clinical Notes.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2018 Dec;2018:683-686. doi: 10.1109/bibm.2018.8621574. Epub 2019 Jan 24.

Prediction of the development of acute kidney injury following cardiac surgery by machine learning.

Crit Care. 2020 Jul 31;24(1):478. doi: 10.1186/s13054-020-03179-9.

Early Prediction of Acute Kidney Injury in the Emergency Department With Machine-Learning Methods Applied to Electronic Health Record Data.

Ann Emerg Med. 2020 Oct;76(4):501-514. doi: 10.1016/j.annemergmed.2020.05.026. Epub 2020 Jul 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在预测小儿体外循环术后急性肾损伤方面，生成式人工智能优于临床专家。

GenAI exceeds clinical experts in predicting acute kidney injury following paediatric cardiopulmonary bypass.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献