利用电子健康记录对患者时间线进行建模的生成式预训练转换器 Foresight：一项回顾性建模研究。

Foresight-a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study.

机构信息

Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; National Institute for Health and Care Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.

Department of Neurology, King's College Hospital National Health Service (NHS) Foundation Trust, London, UK; Guy's and St Thomas' NHS Foundation Trust, London, UK.

出版信息

Lancet Digit Health. 2024 Apr;6(4):e281-e290. doi: 10.1016/S2589-7500(24)00025-6.

DOI:10.1016/S2589-7500(24)00025-6

PMID:38519155

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11220626/

Abstract

BACKGROUND

An electronic health record (EHR) holds detailed longitudinal information about a patient's health status and general clinical history, a large portion of which is stored as unstructured, free text. Existing approaches to model a patient's trajectory focus mostly on structured data and a subset of single-domain outcomes. This study aims to evaluate the effectiveness of Foresight, a generative transformer in temporal modelling of patient data, integrating both free text and structured formats, to predict a diverse array of future medical outcomes, such as disorders, substances (eg, to do with medicines, allergies, or poisonings), procedures, and findings (eg, relating to observations, judgements, or assessments).

METHODS

Foresight is a novel transformer-based pipeline that uses named entity recognition and linking tools to convert EHR document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events, such as disorders, substances, procedures, and findings. The Foresight pipeline has four main components: (1) CogStack (data retrieval and preprocessing); (2) the Medical Concept Annotation Toolkit (structuring of the free-text information from EHRs); (3) Foresight Core (deep-learning model for biomedical concept modelling); and (4) the Foresight web application. We processed the entire free-text portion from three different hospital datasets (King's College Hospital [KCH], South London and Maudsley [SLaM], and the US Medical Information Mart for Intensive Care III [MIMIC-III]), resulting in information from 811 336 patients and covering both physical and mental health institutions. We measured the performance of models using custom metrics derived from precision and recall.

FINDINGS

Foresight achieved a precision@10 (ie, of 10 forecasted candidates, at least one is correct) of 0·68 (SD 0·0027) for the KCH dataset, 0·76 (0·0032) for the SLaM dataset, and 0·88 (0·0018) for the MIMIC-III dataset, for forecasting the next new disorder in a patient timeline. Foresight also achieved a precision@10 value of 0·80 (0·0013) for the KCH dataset, 0·81 (0·0026) for the SLaM dataset, and 0·91 (0·0011) for the MIMIC-III dataset, for forecasting the next new biomedical concept. In addition, Foresight was validated on 34 synthetic patient timelines by five clinicians and achieved a relevancy of 33 (97% [95% CI 91-100]) of 34 for the top forecasted candidate disorder. As a generative model, Foresight can forecast follow-on biomedical concepts for as many steps as required.

INTERPRETATION

Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk forecasting, virtual trials, and clinical research to study the progression of disorders, to simulate interventions and counterfactuals, and for educational purposes.

FUNDING

National Health Service Artificial Intelligence Laboratory, National Institute for Health and Care Research Biomedical Research Centre, and Health Data Research UK.

摘要

背景

电子健康记录 (EHR) 详细记录了患者的健康状况和一般临床病史的纵向信息，其中大部分是以非结构化的自由文本形式存储的。现有的模型患者轨迹的方法主要集中在结构化数据和单一领域的结果上。本研究旨在评估 Foresight 的有效性，这是一种基于生成器的转换器，可用于对患者数据进行时间建模，整合自由文本和结构化格式，以预测各种未来的医疗结果，如疾病、物质（例如，与药物、过敏或中毒有关）、程序和发现（例如，与观察、判断或评估有关）。

方法

Foresight 是一种基于新型转换器的管道，使用命名实体识别和链接工具将 EHR 文档文本转换为结构化、编码的概念，然后为未来的医疗事件（如疾病、物质、程序和发现）提供概率预测。Foresight 管道有四个主要组成部分：（1）CogStack（数据检索和预处理）；（2）Medical Concept Annotation Toolkit（EHR 中自由文本信息的结构化）；（3）Foresight Core（用于生物医学概念建模的深度学习模型）；（4）Foresight 网络应用程序。我们处理了来自三个不同医院数据集（King's College Hospital [KCH]、South London and Maudsley [SLaM] 和美国医疗信息集市强化护理 III [MIMIC-III]）的整个自由文本部分，来自 811 336 名患者的信息，涵盖了身体和心理健康机构。我们使用从精度和召回率派生的自定义指标来衡量模型的性能。

发现

Foresight 在 KCH 数据集上实现了预测患者时间线上下一个新疾病的精度@10（即 10 个预测候选者中至少有一个是正确的）为 0.68（SD 0.0027），在 SLaM 数据集上为 0.76（0.0032），在 MIMIC-III 数据集上为 0.88（0.0018）。Foresight 在 KCH 数据集上还实现了预测下一个新生物医学概念的精度@10 值为 0.80（0.0013），在 SLaM 数据集上为 0.81（0.0026），在 MIMIC-III 数据集上为 0.91（0.0011）。此外，Foresight 由五名临床医生在 34 个合成患者时间线上进行了验证，对于预测的前 34 个候选疾病，相关性为 33（97% [95% CI 91-100]）。作为一种生成模型，Foresight 可以根据需要预测后续的生物医学概念。

解释

Foresight 是一种用于生物医学概念建模的通用模型，可用于现实世界的风险预测、虚拟试验和临床研究，以研究疾病的进展，模拟干预措施和反事实情况，以及用于教育目的。

资金

英国国民保健制度人工智能实验室、英国国家卫生与保健研究中心生物医学研究中心和英国健康数据研究中心。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faff/11220626/d980aec77744/gr1.jpg

相似文献

Foresight-a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study.利用电子健康记录对患者时间线进行建模的生成式预训练转换器 Foresight：一项回顾性建模研究。

Lancet Digit Health. 2024 Apr;6(4):e281-e290. doi: 10.1016/S2589-7500(24)00025-6.

SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research.SemEHR：一个通用的语义搜索系统，用于从临床记录中提取语义数据，以提供个性化护理、临床试验招募和临床研究。

J Am Med Inform Assoc. 2018 May 1;25(5):530-537. doi: 10.1093/jamia/ocx160.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study.基于规则和转换器的非结构化电子健康记录文本注释的 OpenDeID 管道：去识别算法的开发和验证研究。

J Med Internet Res. 2023 Dec 6;25:e48145. doi: 10.2196/48145.

Artificial Intelligence-Enabled Software Prototype to Inform Opioid Pharmacovigilance From Electronic Health Records: Development and Usability Study.用于从电子健康记录中为阿片类药物警戒提供信息的人工智能软件原型：开发与可用性研究。

JMIR AI. 2023 Jan-Dec;2:e45000. doi: 10.2196/45000. Epub 2023 Jul 18.

Transformer- and Generative Adversarial Network-Based Inpatient Traditional Chinese Medicine Prescription Recommendation: Development Study.基于Transformer和生成对抗网络的住院患者中医处方推荐：开发研究

JMIR Med Inform. 2022 May 31;10(5):e35239. doi: 10.2196/35239.

Using Structured Codes and Free-Text Notes to Measure Information Complementarity in Electronic Health Records: Feasibility and Validation Study.使用结构化编码和自由文本注释来衡量电子健康记录中的信息互补性：可行性与验证研究。

J Med Internet Res. 2025 Feb 13;27:e66910. doi: 10.2196/66910.

Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods.使用基于转换器的自然语言处理方法识别与糖尿病视网膜病变相关的临床概念及其属性。

BMC Med Inform Decis Mak. 2022 Sep 27;22(Suppl 3):255. doi: 10.1186/s12911-022-01996-2.

Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource.南伦敦和莫兹利国民保健服务基金会信托生物医学研究中心（SLaM BRC）病例登记册的队列概况：源自电子心理健康记录的数据资源的现状及近期改进

BMJ Open. 2016 Mar 1;6(3):e008721. doi: 10.1136/bmjopen-2015-008721.

Disease Concept-Embedding Based on the Self-Supervised Method for Medical Information Extraction from Electronic Health Records and Disease Retrieval: Algorithm Development and Validation Study.基于自监督方法的疾病概念嵌入在电子健康记录中的医学信息提取和疾病检索：算法开发和验证研究。

J Med Internet Res. 2021 Jan 27;23(1):e25113. doi: 10.2196/25113.

引用本文的文献

Artificial intelligence in prostate cancer.前列腺癌中的人工智能

Chin Med J (Engl). 2025 Aug 5;138(15):1769-1782. doi: 10.1097/CM9.0000000000003689. Epub 2025 Jul 9.

Large language models for disease diagnosis: a scoping review.用于疾病诊断的大语言模型：一项范围综述。

NPJ Artif Intell. 2025;1(1):9. doi: 10.1038/s44387-025-00011-z. Epub 2025 Jun 9.

A scoping review of self-supervised representation learning for clinical decision making using EHR categorical data.一项使用电子健康记录分类数据进行临床决策的自监督表征学习的范围综述。

NPJ Digit Med. 2025 Jun 14;8(1):362. doi: 10.1038/s41746-025-01692-1.

Detecting and Remediating Harmful Data Shifts for the Responsible Deployment of Clinical AI Models.检测并纠正有害数据偏移，以实现临床人工智能模型的负责任部署。

JAMA Netw Open. 2025 Jun 2;8(6):e2513685. doi: 10.1001/jamanetworkopen.2025.13685.

Enhancing Antidiabetic Drug Selection Using Transformers: Machine-Learning Model Development.利用Transformer增强抗糖尿病药物选择：机器学习模型开发

JMIR Med Inform. 2025 Jun 2;13:e67748. doi: 10.2196/67748.

Concept Recognition and Characterization of Patients Undergoing Resection of Vestibular Schwannoma Using Natural Language Processing.使用自然语言处理对前庭神经鞘瘤切除患者进行概念识别与特征描述

J Neurol Surg B Skull Base. 2024 May 11;86(3):332-341. doi: 10.1055/s-0044-1786738. eCollection 2025 Jun.

Clinical insights: A comprehensive review of language models in medicine.临床见解：医学领域语言模型的全面综述

PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.

Medical AI trained on whopping 57 million health records.医学人工智能基于多达5700万份健康记录进行训练。

Nature. 2025 May 6. doi: 10.1038/d41586-025-01422-3.

NLP-enriched social determinants of health improve prediction of suicide death among the Veterans.富含自然语言处理技术的健康社会决定因素可改善对退伍军人自杀死亡的预测。

Res Sq. 2025 Mar 31:rs.3.rs-5067562. doi: 10.21203/rs.3.rs-5067562/v1.

Fine-Tuning Large Language Models for Specialized Use Cases.针对特定用例微调大语言模型。

Mayo Clin Proc Digit Health. 2024 Nov 29;3(1):100184. doi: 10.1016/j.mcpdig.2024.11.005. eCollection 2025 Mar.

本文引用的文献

Hospital-wide natural language processing summarising the health data of 1 million patients.全院范围的自然语言处理对100万名患者的健康数据进行汇总。

PLOS Digit Health. 2023 May 9;2(5):e0000218. doi: 10.1371/journal.pdig.0000218. eCollection 2023 May.

AI chatbots not yet ready for clinical use.人工智能聊天机器人尚未准备好用于临床。

Front Digit Health. 2023 Apr 12;5:1161098. doi: 10.3389/fdgth.2023.1161098. eCollection 2023.

Data consistency in the English Hospital Episodes Statistics database.英国医院住院病例统计数据库的数据一致性。

BMJ Health Care Inform. 2022 Oct;29(1). doi: 10.1136/bmjhci-2022-100633.

Health digital twins as tools for precision medicine: Considerations for computation, implementation, and regulation.作为精准医学工具的健康数字孪生：计算、实施和监管方面的考量

NPJ Digit Med. 2022 Sep 22;5(1):150. doi: 10.1038/s41746-022-00694-7.

The health digital twin to tackle cardiovascular disease-a review of an emerging interdisciplinary field.用于应对心血管疾病的健康数字孪生——新兴跨学科领域综述

NPJ Digit Med. 2022 Aug 26;5(1):126. doi: 10.1038/s41746-022-00640-7.

Estimating redundancy in clinical text.估计临床文本中的冗余度。

J Biomed Inform. 2021 Dec;124:103938. doi: 10.1016/j.jbi.2021.103938. Epub 2021 Oct 23.

Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit.多领域临床自然语言处理与 MedCAT：医学概念标注工具包。

Artif Intell Med. 2021 Jul;117:102083. doi: 10.1016/j.artmed.2021.102083. Epub 2021 May 1.

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.医学BERT：基于大规模结构化电子健康记录进行疾病预测的预训练上下文嵌入模型

NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.

Language models are an effective representation learning technique for electronic health record data.语言模型是一种用于电子健康记录数据的有效表示学习技术。

J Biomed Inform. 2021 Jan;113:103637. doi: 10.1016/j.jbi.2020.103637. Epub 2020 Dec 5.

BEHRT: Transformer for Electronic Health Records.BEHRT：电子健康记录的转换器。

Sci Rep. 2020 Apr 28;10(1):7155. doi: 10.1038/s41598-020-62922-y.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用电子健康记录对患者时间线进行建模的生成式预训练转换器 Foresight：一项回顾性建模研究。

Foresight-a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study.

机构信息

出版信息

BACKGROUND

METHODS

FINDINGS

INTERPRETATION

FUNDING

背景

方法

发现

解释

资金

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献