Suppr超能文献

重症监护机器学习模型中的缺失特征:观察性研究

On Missingness Features in Machine Learning Models for Critical Care: Observational Study.

作者信息

Singh Janmajay, Sato Masahiro, Ohkuma Tomoko

机构信息

Fuji Xerox Co, Ltd, Yokohama, Japan.

出版信息

JMIR Med Inform. 2021 Dec 8;9(12):e25022. doi: 10.2196/25022.

Abstract

BACKGROUND

Missing data in electronic health records is inevitable and considered to be nonrandom. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient's health and advocate for their inclusion in clinical prediction models. But their effectiveness has not been comprehensively evaluated.

OBJECTIVE

The goal of the research is to study the effect of including informative missingness features in machine learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings.

METHODS

A total of 48,336 electronic health records from the 2012 and 2019 PhysioNet Challenges were used, and mortality, length of stay, and sepsis outcomes were chosen. The latter dataset was multicenter, allowing external validation. Gated recurrent units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria and across population subgroups evaluating discriminative ability and calibration.

RESULTS

Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (area under the curve of the receiver operating characteristic [AUROC] improved from 1.2% to 7.7%) and even patient subgroup. However, missingness features did not display utility in a simulated prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections.

CONCLUSIONS

This study comprehensively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like length of stay prediction where they present the greatest benefit. While missingness features, representative of health care processes, vary greatly due to intra- and interhospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further.

摘要

背景

电子健康记录中的数据缺失不可避免,且被认为是非随机的。多项研究发现,表明缺失模式(缺失性)的特征编码了有关患者健康的有用信息,并主张将其纳入临床预测模型。但其有效性尚未得到全面评估。

目的

本研究的目的是研究在机器学习模型中纳入信息性缺失特征对各种临床相关结果的影响,并探讨这些特征在不同患者亚组和任务设置中的稳健性。

方法

使用了来自2012年和2019年PhysioNet挑战赛的总共48336份电子健康记录,并选择了死亡率、住院时间和败血症结果。后一个数据集是多中心的,允许进行外部验证。使用门控循环单元来学习数据中的序列模式,并对感兴趣的标签进行分类或预测。根据各种标准并在不同人群亚组中对模型进行评估,以评估判别能力和校准情况。

结果

在回顾性任务中,纳入缺失特征后通常观察到模型性能有所改善。改善程度取决于感兴趣的结果(受试者操作特征曲线下面积[AUROC]提高了1.2%至7.7%),甚至还取决于患者亚组。然而,在模拟的前瞻性设置中,缺失特征并未显示出效用,仅依赖病理特征的模型表现更优(AUROC相差0.9%)。尽管这导致了疾病的更早检测(真阳性),但由于纳入这些特征导致假阳性检测随之增加。

结论

本研究全面评估了缺失特征对机器学习模型的有效性。深入了解这些特征如何影响模型性能可能会使其在临床环境中得到明智应用,特别是对于住院时间预测等行政任务,在这些任务中它们带来的益处最大。虽然代表医疗保健过程的缺失特征因医院内部和医院之间的因素而有很大差异,但它们仍可用于临床相关结果的预测模型。然而,它们在前瞻性模型中频繁进行预测时的使用需要进一步探索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/295a/8701717/8ccd591f16cc/medinform_v9i12e25022_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验