• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从电子健康记录中稳健地提取肺炎相关临床状态。

Robust extraction of pneumonia-associated clinical states from electronic health records.

机构信息

Department of Engineering Sciences and Applied Math, Northwestern University, Evanston, IL 60208.

Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL 60208.

出版信息

Proc Natl Acad Sci U S A. 2024 Nov 5;121(45):e2417688121. doi: 10.1073/pnas.2417688121. Epub 2024 Oct 30.

DOI:10.1073/pnas.2417688121
PMID:39475648
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11551366/
Abstract

Mining of electronic health records (EHR) promises to automate the identification of comprehensive disease phenotypes. However, the realization of this promise is hindered by the unavailability of generalizable ground-truth information, data incompleteness and heterogeneity, and the lack of generalization to multiple cohorts. We present here a data-driven approach to identify clinical states that we implement for 585 critical care patients with suspected pneumonia recruited by the SCRIPT study, which we compare to and integrate with 9,918 pneumonia patients from the MIMIC-IV dataset. We extract and curate from their structured EHRs a primary set of clinical features (53 and 59 features for SCRIPT and MIMIC-IV, respectively), including disease severity scores, vital signs, and so on, at various degrees of completeness. We aggregate irregular time series into daily frequency, resulting in 12,495 and 94,684 patient-day pairs for SCRIPT and MIMIC, respectively. We define a "common-sense" ground truth that we then use in a semisupervised pipeline to optimize choices for data preprocessing, and reduce the feature space to four principal components. We describe and validate an ensemble-based clustering method that enables us to robustly identify five clinical states, and use a Gaussian mixture model to quantify uncertainty in cluster assignment. Demonstrating the clinical relevance of the identified states, we find that three states are strongly associated with disease outcomes (dying vs. recovering), while the other two reflect disease etiology. The outcome associated clinical states provide significantly increased discrimination of mortality rates over standard approaches.

摘要

电子健康记录(EHR)的挖掘有望实现全面疾病表型的自动化识别。然而,由于缺乏可推广的真实信息、数据不完整和异质性,以及缺乏对多个队列的泛化能力,这一承诺的实现受到了阻碍。我们在这里提出了一种数据驱动的方法来识别临床状态,我们将其应用于 SCRIPT 研究中招募的 585 名疑似肺炎的重症监护患者,并将其与 MIMIC-IV 数据集的 9918 名肺炎患者进行比较和整合。我们从他们的结构化 EHR 中提取和整理了一组主要的临床特征(分别为 SCRIPT 和 MIMIC-IV 的 53 个和 59 个特征),包括疾病严重程度评分、生命体征等,其完整性程度不一。我们将不规则的时间序列汇总到每日频率中,分别为 SCRIPT 和 MIMIC 生成了 12495 和 94684 个患者日对。我们定义了一个“常识”的真实信息,然后在一个半监督的管道中使用它来优化数据预处理的选择,并将特征空间减少到四个主成分。我们描述并验证了一种基于集成的聚类方法,使我们能够稳健地识别五个临床状态,并使用高斯混合模型来量化聚类分配的不确定性。为了证明所识别的状态的临床相关性,我们发现三个状态与疾病结局(死亡与恢复)强烈相关,而另外两个状态反映了疾病的病因。与结局相关的临床状态显著提高了死亡率的区分能力,优于标准方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72df/11551366/b4c708e69caf/pnas.2417688121fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72df/11551366/2bd2cd030f31/pnas.2417688121fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72df/11551366/8fd852ec74ec/pnas.2417688121fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72df/11551366/c74ca483adc7/pnas.2417688121fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72df/11551366/b4c708e69caf/pnas.2417688121fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72df/11551366/2bd2cd030f31/pnas.2417688121fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72df/11551366/8fd852ec74ec/pnas.2417688121fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72df/11551366/c74ca483adc7/pnas.2417688121fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72df/11551366/b4c708e69caf/pnas.2417688121fig04.jpg

相似文献

1
Robust extraction of pneumonia-associated clinical states from electronic health records.从电子健康记录中稳健地提取肺炎相关临床状态。
Proc Natl Acad Sci U S A. 2024 Nov 5;121(45):e2417688121. doi: 10.1073/pnas.2417688121. Epub 2024 Oct 30.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data.通过FIDDLE实现电子健康记录分析的普及:一种用于结构化临床数据的灵活的数据驱动预处理管道。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1921-1934. doi: 10.1093/jamia/ocaa139.
4
A flexible data-driven comorbidity feature extraction framework.一个灵活的数据驱动的共病特征提取框架。
Comput Biol Med. 2016 Jun 1;73:165-72. doi: 10.1016/j.compbiomed.2016.04.014. Epub 2016 Apr 20.
5
A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。
Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.
6
DEVELOPMENT AND PERFORMANCE OF TEXT-MINING ALGORITHMS TO EXTRACT SOCIOECONOMIC STATUS FROM DE-IDENTIFIED ELECTRONIC HEALTH RECORDS.用于从去识别化电子健康记录中提取社会经济地位的文本挖掘算法的开发与性能
Pac Symp Biocomput. 2017;22:230-241. doi: 10.1142/9789813207813_0023.
7
A Comprehensive Natural Language Processing Pipeline for the Chronic Lupus Disease.用于慢性狼疮病的综合自然语言处理管道。
Stud Health Technol Inform. 2024 Aug 22;316:909-913. doi: 10.3233/SHTI240559.
8
High-throughput phenotyping with temporal sequences.高通量表型分析与时间序列。
J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.
9
Text-mining in electronic healthcare records can be used as efficient tool for screening and data collection in cardiovascular trials: a multicenter validation study.电子医疗记录中的文本挖掘可以作为心血管试验中筛选和数据收集的有效工具:一项多中心验证研究。
J Clin Epidemiol. 2021 Apr;132:97-105. doi: 10.1016/j.jclinepi.2020.11.014. Epub 2020 Nov 25.
10
Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.基于 FHIR 的电子健康记录表型框架的开发:以从出院小结中识别肥胖且伴有多种合并症的患者为例。
J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.

本文引用的文献

1
Novel pneumonia score based on a machine learning model for predicting mortality in pneumonia patients on admission to the intensive care unit.基于机器学习模型的新型肺炎评分,用于预测肺炎患者入住重症监护病房时的死亡率。
Respir Med. 2023 Oct;217:107363. doi: 10.1016/j.rmed.2023.107363. Epub 2023 Jul 13.
2
Machine learning links unresolving secondary pneumonia to mortality in patients with severe pneumonia, including COVID-19.机器学习将未解决的继发性肺炎与包括 COVID-19 在内的重症肺炎患者的死亡率联系起来。
J Clin Invest. 2023 Jun 15;133(12):e170682. doi: 10.1172/JCI170682.
3
MIMIC-IV, a freely accessible electronic health record dataset.
MIMIC-IV,一个可自由访问的电子健康记录数据集。
Sci Data. 2023 Jan 3;10(1):1. doi: 10.1038/s41597-022-01899-x.
4
Data-driven identification of heart failure disease states and progression pathways using electronic health records.基于电子健康记录的数据驱动方法识别心力衰竭疾病状态和进展途径。
Sci Rep. 2022 Oct 25;12(1):17871. doi: 10.1038/s41598-022-22398-4.
5
Temporal Subtyping of Alzheimer's Disease Using Medical Conditions Preceding Alzheimer's Disease Onset in Electronic Health Records.使用电子健康记录中阿尔茨海默病发病前的医疗条件对阿尔茨海默病进行时间亚分型。
AMIA Jt Summits Transl Sci Proc. 2022 May 23;2022:226-235. eCollection 2022.
6
Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies.深度学习在电子健康记录中的时间数据表示:挑战和方法的系统评价。
J Biomed Inform. 2022 Feb;126:103980. doi: 10.1016/j.jbi.2021.103980. Epub 2021 Dec 30.
7
Evaluating the state of the art in missing data imputation for clinical data.评估临床数据缺失值插补的最新技术状态。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab489.
8
Bronchoalveolar Lavage and Blood Markers of Infection in Critically Ill Patients-A Single Center Registry Study.重症患者的支气管肺泡灌洗与感染血液标志物——一项单中心登记研究
J Clin Med. 2021 Jan 29;10(3):486. doi: 10.3390/jcm10030486.
9
Deep representation learning of electronic health records to unlock patient stratification at scale.电子健康记录的深度表征学习,以大规模实现患者分层。
NPJ Digit Med. 2020 Jul 17;3:96. doi: 10.1038/s41746-020-0301-z. eCollection 2020.
10
Next Steps in Pneumonia Severity Scores.肺炎严重程度评分的下一步进展。
Clin Infect Dis. 2021 Mar 15;72(6):950-952. doi: 10.1093/cid/ciaa184.