Suppr超能文献

利用电子健康记录进行疾病预测的纵向队列研究:美国人群研究。

Longitudinal cohorts for harnessing the electronic health record for disease prediction in a US population.

机构信息

Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA.

Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA.

出版信息

BMJ Open. 2021 Jun 8;11(6):e044353. doi: 10.1136/bmjopen-2020-044353.

Abstract

PURPOSE

The depth and breadth of clinical data within electronic health record (EHR) systems paired with innovative machine learning methods can be leveraged to identify novel risk factors for complex diseases. However, analysing the EHR is challenging due to complexity and quality of the data. Therefore, we developed large electronic population-based cohorts with comprehensive harmonised and processed EHR data.

PARTICIPANTS

All individuals 30 years of age or older who resided in Olmsted County, Minnesota on 1 January 2006 were identified for the discovery cohort. Algorithms to define a variety of patient characteristics were developed and validated, thus building a comprehensive risk profile for each patient. Patients are followed for incident diseases and ageing-related outcomes. Using the same methods, an independent validation cohort was assembled by identifying all individuals 30 years of age or older who resided in the largely rural 26-county area of southern Minnesota and western Wisconsin on 1 January 2013.

FINDINGS TO DATE

For the discovery cohort, 76 255 individuals (median age 49; 53% women) were identified from which a total of 9 644 221 laboratory results; 9 513 840 diagnosis codes; 10 924 291 procedure codes; 1 277 231 outpatient drug prescriptions; 966 136 heart rate measurements and 1 159 836 blood pressure (BP) measurements were retrieved during the baseline time period. The most prevalent conditions in this cohort were hyperlipidaemia, hypertension and arthritis. For the validation cohort, 333 460 individuals (median age 54; 52% women) were identified and to date, a total of 19 926 750 diagnosis codes, 10 527 444 heart rate measurements and 7 356 344 BP measurements were retrieved during baseline.

FUTURE PLANS

Using advanced machine learning approaches, these electronic cohorts will be used to identify novel sex-specific risk factors for complex diseases. These approaches will allow us to address several challenges with the use of EHR.

摘要

目的

电子健康记录 (EHR) 系统中的临床数据深度和广度,加上创新的机器学习方法,可以用于识别复杂疾病的新的危险因素。然而,由于数据的复杂性和质量,分析 EHR 具有挑战性。因此,我们开发了大型基于电子的人群队列,这些队列具有全面的、协调的和处理过的 EHR 数据。

参与者

所有在 2006 年 1 月 1 日居住在明尼苏达州奥姆斯特德县的 30 岁或以上的个体都被确定为发现队列。为每个患者构建了全面的风险概况,开发并验证了用于定义各种患者特征的算法。患者会因发病和与衰老相关的结果而被随访。使用相同的方法,通过确定 2013 年 1 月 1 日居住在明尼苏达州南部和威斯康星州西部的 26 个县大部分农村地区的所有 30 岁或以上的个体,组装了一个独立的验证队列。

迄今为止的发现

对于发现队列,从 76255 名个体(中位数年龄 49 岁;53%为女性)中确定了总共 9644221 个实验室结果;9513840 个诊断代码;10924291 个程序代码;1277231 个门诊药物处方;966136 个心率测量值和 1159836 个血压 (BP) 测量值在基线期间检索到。该队列中最常见的疾病是高脂血症、高血压和关节炎。对于验证队列,从 333460 名个体(中位数年龄 54 岁;52%为女性)中确定了总共 19926750 个诊断代码、10527444 个心率测量值和 7356344 个 BP 测量值在基线期间检索到。

未来计划

使用先进的机器学习方法,这些电子队列将用于识别复杂疾病的新的性别特异性危险因素。这些方法将使我们能够解决使用 EHR 面临的一些挑战。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验