重症监护病房患者资料中缺失数据的新见解：观察性研究。

A New Insight Into Missing Data in Intensive Care Unit Patient Profiles: Observational Study.

作者信息

Sharafoddini Anis, Dubin Joel A, Maslove David M, Lee Joon

机构信息

Health Data Science Lab, School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada.

Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada.

出版信息

JMIR Med Inform. 2019 Jan 8;7(1):e11605. doi: 10.2196/11605.

DOI:10.2196/11605

PMID:30622091

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6329436/

Abstract

BACKGROUND

The data missing from patient profiles in intensive care units (ICUs) are substantial and unavoidable. However, this incompleteness is not always random or because of imperfections in the data collection process.

OBJECTIVE

This study aimed to investigate the potential hidden information in data missing from electronic health records (EHRs) in an ICU and examine whether the presence or missingness of a variable itself can convey information about the patient health status.

METHODS

Daily retrieval of laboratory test (LT) measurements from the Medical Information Mart for Intensive Care III database was set as our reference for defining complete patient profiles. Missingness indicators were introduced as a way of representing presence or absence of the LTs in a patient profile. Thereafter, various feature selection methods (filter and embedded feature selection methods) were used to examine the predictive power of missingness indicators. Finally, a set of well-known prediction models (logistic regression [LR], decision tree, and random forest) were used to evaluate whether the absence status itself of a variable recording can provide predictive power. We also examined the utility of missingness indicators in improving predictive performance when used with observed laboratory measurements as model input. The outcome of interest was in-hospital mortality and mortality at 30 days after ICU discharge.

RESULTS

Regardless of mortality type or ICU day, more than 40% of the predictors selected by feature selection methods were missingness indicators. Notably, employing missingness indicators as the only predictors achieved reasonable mortality prediction on all days and for all mortality types (for instance, in 30-day mortality prediction with LR, we achieved area under the curve of the receiver operating characteristic [AUROC] of 0.6836±0.012). Including indicators with observed measurements in the prediction models also improved the AUROC; the maximum improvement was 0.0426. Indicators also improved the AUROC for Simplified Acute Physiology Score II model-a well-known ICU severity of illness score-confirming the additive information of the indicators (AUROC of 0.8045±0.0109 for 30-day mortality prediction for LR).

CONCLUSIONS

Our study demonstrated that the presence or absence of LT measurements is informative and can be considered a potential predictor of in-hospital and 30-day mortality. The comparative analysis of prediction models also showed statistically significant prediction improvement when indicators were included. Moreover, missing data might reflect the opinions of examining clinicians. Therefore, the absence of measurements can be informative in ICUs and has predictive power beyond the measured data themselves. This initial case study shows promise for more in-depth analysis of missing data and its informativeness in ICUs. Future studies are needed to generalize these results.

摘要

背景

重症监护病房（ICU）患者资料中缺失的数据量大且不可避免。然而，这种不完整性并非总是随机的，也并非是由于数据收集过程中的缺陷所致。

目的

本研究旨在调查ICU电子健康记录（EHR）中缺失数据的潜在隐藏信息，并检验变量本身的存在或缺失是否能够传达有关患者健康状况的信息。

方法

将从重症监护医学信息集市III数据库每日检索的实验室检查（LT）测量值作为定义完整患者资料的参考。引入缺失指标，作为表示患者资料中LT存在或不存在的一种方式。此后，使用各种特征选择方法（过滤和嵌入式特征选择方法）来检验缺失指标的预测能力。最后，使用一组知名的预测模型（逻辑回归[LR]、决策树和随机森林）来评估变量记录的缺失状态本身是否能够提供预测能力。我们还检验了在将缺失指标与观察到的实验室测量值作为模型输入一起使用时，其在提高预测性能方面的效用。感兴趣的结局是住院死亡率和ICU出院后30天的死亡率。

结果

无论死亡率类型或ICU天数如何，通过特征选择方法选择的预测变量中超过40%是缺失指标。值得注意的是，仅将缺失指标用作预测变量，在所有天数和所有死亡率类型上均实现了合理的死亡率预测（例如，在使用LR进行30天死亡率预测时，我们获得的受试者工作特征曲线下面积[AUROC]为0.6836±0.012）。在预测模型中纳入带有观察测量值的指标也提高了AUROC；最大提高为0.0426。指标还提高了简化急性生理学评分II模型（一种著名的ICU疾病严重程度评分）的AUROC，证实了指标的附加信息（LR对30天死亡率预测的AUROC为0.8045±0.0109）。

结论

我们的研究表明，LT测量值的存在或缺失具有信息价值，可被视为住院和30天死亡率的潜在预测指标。预测模型的比较分析还显示，纳入指标后预测有统计学显著改善。此外，缺失数据可能反映了检查临床医生的意见。因此，测量值的缺失在ICU中可能具有信息价值，并且具有超出测量数据本身的预测能力。这项初步的案例研究显示了对ICU中缺失数据及其信息价值进行更深入分析的前景。需要进一步的研究来推广这些结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07c3/6329436/605afa822d08/medinform_v7i1e11605_fig1.jpg

相似文献

A New Insight Into Missing Data in Intensive Care Unit Patient Profiles: Observational Study.重症监护病房患者资料中缺失数据的新见解：观察性研究。

JMIR Med Inform. 2019 Jan 8;7(1):e11605. doi: 10.2196/11605.

Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records.动态可解释机器学习预测 ICU 患者死亡率：电子患者记录中高频数据的回顾性研究。

Lancet Digit Health. 2020 Apr;2(4):e179-e191. doi: 10.1016/S2589-7500(20)30018-2. Epub 2020 Mar 12.

Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach.基于集成学习方法的重症监护病房患者早期住院病死率预测。

Int J Med Inform. 2017 Dec;108:185-195. doi: 10.1016/j.ijmedinf.2017.10.002. Epub 2017 Oct 5.

Using machine learning methods to predict in-hospital mortality of sepsis patients in the ICU.使用机器学习方法预测 ICU 中脓毒症患者的院内死亡率。

BMC Med Inform Decis Mak. 2020 Oct 2;20(1):251. doi: 10.1186/s12911-020-01271-2.

[Prediction of intensive care unit readmission for critically ill patients based on ensemble learning].基于集成学习的危重症患者重症监护病房再入院预测

Beijing Da Xue Xue Bao Yi Xue Ban. 2021 Jun 18;53(3):566-572. doi: 10.19723/j.issn.1671-167X.2021.03.021.

Predicting in-hospital mortality of patients with acute kidney injury in the ICU using random forest model.应用随机森林模型预测 ICU 中急性肾损伤患者的院内死亡率。

Int J Med Inform. 2019 May;125:55-61. doi: 10.1016/j.ijmedinf.2019.02.002. Epub 2019 Feb 12.

Prediction of Sepsis in the Intensive Care Unit With Minimal Electronic Health Record Data: A Machine Learning Approach.利用最少电子健康记录数据预测重症监护病房中的脓毒症：一种机器学习方法。

JMIR Med Inform. 2016 Sep 30;4(3):e28. doi: 10.2196/medinform.5909.

On Missingness Features in Machine Learning Models for Critical Care: Observational Study.重症监护机器学习模型中的缺失特征：观察性研究

JMIR Med Inform. 2021 Dec 8;9(12):e25022. doi: 10.2196/25022.

Neural networks based on attention architecture are robust to data missingness for early predicting hospital mortality in intensive care unit patients.基于注意力架构的神经网络对于重症监护病房患者早期预测医院死亡率的数据缺失具有鲁棒性。

Digit Health. 2023 May 7;9:20552076231171482. doi: 10.1177/20552076231171482. eCollection 2023 Jan-Dec.

[Predicting prolonged length of intensive care unit stay machine learning].[预测重症监护病房长期住院时间机器学习]

Beijing Da Xue Xue Bao Yi Xue Ban. 2021 Dec 18;53(6):1163-1170. doi: 10.19723/j.issn.1671-167X.2021.06.026.

引用本文的文献

Epidemiological Insights into Colorectal Cancer Survival in Kazakhstan (2014-2023): A Retrospective Analysis Using the National Electronic Registry of Oncological Patients.哈萨克斯坦结直肠癌生存情况的流行病学洞察（2014 - 2023年）：一项使用国家肿瘤患者电子登记系统的回顾性分析

Cancers (Basel). 2025 Jul 14;17(14):2336. doi: 10.3390/cancers17142336.

Implicit bias in ICU electronic health record data: measurement frequencies and missing data rates of clinical variables.重症监护病房电子健康记录数据中的隐性偏差：临床变量的测量频率和缺失数据率

BMC Med Inform Decis Mak. 2025 Jul 1;25(1):241. doi: 10.1186/s12911-025-03058-9.

Machine Learning-Augmented Triage for Sepsis: Real-Time ICU Mortality Prediction Using SHAP-Explained Meta-Ensemble Models.用于脓毒症的机器学习增强分诊：使用SHAP解释的元集成模型进行重症监护病房实时死亡率预测

Biomedicines. 2025 Jun 12;13(6):1449. doi: 10.3390/biomedicines13061449.

Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data.处理纵向缺失数据的插补与缺失指示符：基于电子健康记录数据的模拟分析

JMIR Med Inform. 2025 Mar 13;13:e64354. doi: 10.2196/64354.

Entering the new digital era of intensive care medicine: an overview of interdisciplinary approaches to use artificial intelligence for patients' benefit.进入重症监护医学的新数字时代：利用人工智能造福患者的跨学科方法概述。

Eur J Anaesthesiol Intensive Care. 2022 Dec 21;2(1):e0014. doi: 10.1097/EA9.0000000000000014. eCollection 2023 Feb.

Overcoming Missing Data: Accurately Predicting Cardiovascular Risk in Type 2 Diabetes, A Systematic Review.克服数据缺失：2型糖尿病心血管风险的准确预测，一项系统评价

J Diabetes. 2025 Jan;17(1):e70049. doi: 10.1111/1753-0407.70049.

Machine and Deep Learning Models for Hypoxemia Severity Triage in CBRNE Emergencies.用于化学、生物、放射、核及爆炸物（CBRNE）紧急情况中低氧血症严重程度分类的机器学习和深度学习模型

Diagnostics (Basel). 2024 Dec 8;14(23):2763. doi: 10.3390/diagnostics14232763.

Forecasting Patient Early Readmission from Irish Hospital Discharge Records Using Conventional Machine Learning Models.使用传统机器学习模型从爱尔兰医院出院记录预测患者早期再入院情况。

Diagnostics (Basel). 2024 Oct 29;14(21):2405. doi: 10.3390/diagnostics14212405.

Relationship Between Pain and Delirium in Critically Ill Adults.危重症成年患者疼痛与谵妄之间的关系

Crit Care Explor. 2023 Dec 1;5(12):e1012. doi: 10.1097/CCE.0000000000001012. eCollection 2023 Dec.

Personalized event prediction for Electronic Health Records.电子健康记录的个性化事件预测。

Artif Intell Med. 2023 Sep;143:102620. doi: 10.1016/j.artmed.2023.102620. Epub 2023 Jul 20.

本文引用的文献

Biases in electronic health record data due to processes within the healthcare system: retrospective observational study.由于医疗体系内的流程而导致电子健康记录数据出现偏差：回顾性观察性研究。

BMJ. 2018 Apr 30;361:k1479. doi: 10.1136/bmj.k1479.

Serum anion gap at admission as a predictor of mortality in the pediatric intensive care unit.入院时血清阴离子间隙作为儿科重症监护病房死亡率的预测指标。

Sci Rep. 2017 May 3;7(1):1456. doi: 10.1038/s41598-017-01681-9.

JMIR Med Inform. 2017 Mar 3;5(1):e7. doi: 10.2196/medinform.6730.

Small improvement in the area under the receiver operating characteristic curve indicated small changes in predicted risks.受试者工作特征曲线下面积的小幅改善表明预测风险的微小变化。

J Clin Epidemiol. 2016 Nov;79:159-164. doi: 10.1016/j.jclinepi.2016.07.002. Epub 2016 Jul 16.

Serum Anion Gap Predicts All-Cause Mortality in Patients with Advanced Chronic Kidney Disease: A Retrospective Analysis of a Randomized Controlled Study.血清阴离子间隙可预测晚期慢性肾脏病患者的全因死亡率：一项随机对照研究的回顾性分析

PLoS One. 2016 Jun 1;11(6):e0156381. doi: 10.1371/journal.pone.0156381. eCollection 2016.

MIMIC-III, a freely accessible critical care database.MIMIC-III，一个免费获取的重症监护数据库。

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

Relation between elevated blood urea nitrogen, clinical features or comorbidities, and clinical outcome in patients hospitalized for acute heart failure syndromes.急性心力衰竭综合征住院患者血尿素氮升高与临床特征、合并症及临床结局的关系。

Int J Cardiol. 2015 Dec 15;201:311-4. doi: 10.1016/j.ijcard.2015.08.061. Epub 2015 Aug 6.

Using information theory to identify redundancy in common laboratory tests in the intensive care unit.运用信息论识别重症监护病房常见实验室检查中的冗余信息。

BMC Med Inform Decis Mak. 2015 Jul 31;15:59. doi: 10.1186/s12911-015-0187-x.

Strategies for handling missing data in electronic health record derived data.电子健康记录衍生数据中缺失数据的处理策略。

EGEMS (Wash DC). 2013 Dec 17;1(3):1035. doi: 10.13063/2327-9214.1035. eCollection 2013.

Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research.隐藏在明显之处：在为研究从电子健康记录数据充足的患者中抽样时，对患病患者的偏好。

BMC Med Inform Decis Mak. 2014 Jun 11;14:51. doi: 10.1186/1472-6947-14-51.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

重症监护病房患者资料中缺失数据的新见解：观察性研究。

A New Insight Into Missing Data in Intensive Care Unit Patient Profiles: Observational Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献