Suppr超能文献

利用电子健康记录对患者时间线进行建模的生成式预训练转换器 Foresight:一项回顾性建模研究。

Foresight-a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study.

机构信息

Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; National Institute for Health and Care Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.

Department of Neurology, King's College Hospital National Health Service (NHS) Foundation Trust, London, UK; Guy's and St Thomas' NHS Foundation Trust, London, UK.

出版信息

Lancet Digit Health. 2024 Apr;6(4):e281-e290. doi: 10.1016/S2589-7500(24)00025-6.

Abstract

BACKGROUND

An electronic health record (EHR) holds detailed longitudinal information about a patient's health status and general clinical history, a large portion of which is stored as unstructured, free text. Existing approaches to model a patient's trajectory focus mostly on structured data and a subset of single-domain outcomes. This study aims to evaluate the effectiveness of Foresight, a generative transformer in temporal modelling of patient data, integrating both free text and structured formats, to predict a diverse array of future medical outcomes, such as disorders, substances (eg, to do with medicines, allergies, or poisonings), procedures, and findings (eg, relating to observations, judgements, or assessments).

METHODS

Foresight is a novel transformer-based pipeline that uses named entity recognition and linking tools to convert EHR document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events, such as disorders, substances, procedures, and findings. The Foresight pipeline has four main components: (1) CogStack (data retrieval and preprocessing); (2) the Medical Concept Annotation Toolkit (structuring of the free-text information from EHRs); (3) Foresight Core (deep-learning model for biomedical concept modelling); and (4) the Foresight web application. We processed the entire free-text portion from three different hospital datasets (King's College Hospital [KCH], South London and Maudsley [SLaM], and the US Medical Information Mart for Intensive Care III [MIMIC-III]), resulting in information from 811 336 patients and covering both physical and mental health institutions. We measured the performance of models using custom metrics derived from precision and recall.

FINDINGS

Foresight achieved a precision@10 (ie, of 10 forecasted candidates, at least one is correct) of 0·68 (SD 0·0027) for the KCH dataset, 0·76 (0·0032) for the SLaM dataset, and 0·88 (0·0018) for the MIMIC-III dataset, for forecasting the next new disorder in a patient timeline. Foresight also achieved a precision@10 value of 0·80 (0·0013) for the KCH dataset, 0·81 (0·0026) for the SLaM dataset, and 0·91 (0·0011) for the MIMIC-III dataset, for forecasting the next new biomedical concept. In addition, Foresight was validated on 34 synthetic patient timelines by five clinicians and achieved a relevancy of 33 (97% [95% CI 91-100]) of 34 for the top forecasted candidate disorder. As a generative model, Foresight can forecast follow-on biomedical concepts for as many steps as required.

INTERPRETATION

Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk forecasting, virtual trials, and clinical research to study the progression of disorders, to simulate interventions and counterfactuals, and for educational purposes.

FUNDING

National Health Service Artificial Intelligence Laboratory, National Institute for Health and Care Research Biomedical Research Centre, and Health Data Research UK.

摘要

背景

电子健康记录 (EHR) 详细记录了患者的健康状况和一般临床病史的纵向信息,其中大部分是以非结构化的自由文本形式存储的。现有的模型患者轨迹的方法主要集中在结构化数据和单一领域的结果上。本研究旨在评估 Foresight 的有效性,这是一种基于生成器的转换器,可用于对患者数据进行时间建模,整合自由文本和结构化格式,以预测各种未来的医疗结果,如疾病、物质(例如,与药物、过敏或中毒有关)、程序和发现(例如,与观察、判断或评估有关)。

方法

Foresight 是一种基于新型转换器的管道,使用命名实体识别和链接工具将 EHR 文档文本转换为结构化、编码的概念,然后为未来的医疗事件(如疾病、物质、程序和发现)提供概率预测。Foresight 管道有四个主要组成部分:(1)CogStack(数据检索和预处理);(2)Medical Concept Annotation Toolkit(EHR 中自由文本信息的结构化);(3)Foresight Core(用于生物医学概念建模的深度学习模型);(4)Foresight 网络应用程序。我们处理了来自三个不同医院数据集(King's College Hospital [KCH]、South London and Maudsley [SLaM] 和美国医疗信息集市强化护理 III [MIMIC-III])的整个自由文本部分,来自 811 336 名患者的信息,涵盖了身体和心理健康机构。我们使用从精度和召回率派生的自定义指标来衡量模型的性能。

发现

Foresight 在 KCH 数据集上实现了预测患者时间线上下一个新疾病的精度@10(即 10 个预测候选者中至少有一个是正确的)为 0.68(SD 0.0027),在 SLaM 数据集上为 0.76(0.0032),在 MIMIC-III 数据集上为 0.88(0.0018)。Foresight 在 KCH 数据集上还实现了预测下一个新生物医学概念的精度@10 值为 0.80(0.0013),在 SLaM 数据集上为 0.81(0.0026),在 MIMIC-III 数据集上为 0.91(0.0011)。此外,Foresight 由五名临床医生在 34 个合成患者时间线上进行了验证,对于预测的前 34 个候选疾病,相关性为 33(97% [95% CI 91-100])。作为一种生成模型,Foresight 可以根据需要预测后续的生物医学概念。

解释

Foresight 是一种用于生物医学概念建模的通用模型,可用于现实世界的风险预测、虚拟试验和临床研究,以研究疾病的进展,模拟干预措施和反事实情况,以及用于教育目的。

资金

英国国民保健制度人工智能实验室、英国国家卫生与保健研究中心生物医学研究中心和英国健康数据研究中心。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faff/11220626/d980aec77744/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验