Suppr超能文献

评估特征选择方法在临床医学中存在时间数据集偏移时保持机器学习性能的能力。

Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine.

机构信息

Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Ontario, Canada.

Biomedical Informatics Research, Stanford University, Palo Alto, California, United States.

出版信息

Methods Inf Med. 2023 May;62(1-02):60-70. doi: 10.1055/s-0043-1762904. Epub 2023 Feb 22.

Abstract

BACKGROUND

Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust to temporal dataset shift as measured by out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance.

METHODS

Our dataset consisted of intensive care unit patients from MIMIC-IV categorized by year groups (2008-2010, 2011-2013, 2014-2016, and 2017-2019). We trained baseline models using L2-regularized logistic regression on 2008-2010 to predict in-hospital mortality, long length of stay (LOS), sepsis, and invasive ventilation in all year groups. We evaluated three feature selection methods: L1-regularized logistic regression (L1), Remove and Retrain (ROAR), and causal feature selection. We assessed whether a feature selection method could maintain ID performance (2008-2010) and improve OOD performance (2017-2019). We also assessed whether parsimonious models retrained on OOD data performed as well as oracle models trained on all features in the OOD year group.

RESULTS

The baseline model showed significantly worse OOD performance with the long LOS and sepsis tasks when compared with the ID performance. L1 and ROAR retained 3.7 to 12.6% of all features, whereas causal feature selection generally retained fewer features. Models produced by L1 and ROAR exhibited similar ID and OOD performance as the baseline models. The retraining of these models on 2017-2019 data using features selected from training on 2008-2010 data generally reached parity with oracle models trained directly on 2017-2019 data using all available features. Causal feature selection led to heterogeneous results with the superset maintaining ID performance while improving OOD calibration only on the long LOS task.

CONCLUSIONS

While model retraining can mitigate the impact of temporal dataset shift on parsimonious models produced by L1 and ROAR, new methods are required to proactively improve temporal robustness.

摘要

背景

随着训练数据与部署数据之间的差异随时间的推移而不断增大,时间数据集的偏移会导致模型性能下降。主要目标是确定通过特定特征选择方法生成的简约模型是否更能抵抗时间数据集偏移,方法是衡量其在分布外(OOD)性能上的稳健性,同时保持在分布内(ID)性能。

方法

我们的数据集由 MIMIC-IV 中的重症监护病房患者组成,按年份分组(2008-2010 年、2011-2013 年、2014-2016 年和 2017-2019 年)。我们使用 L2-正则化逻辑回归在 2008-2010 年的数据集上训练基线模型,以预测所有年份组的住院死亡率、长住院时间(LOS)、败血症和有创通气。我们评估了三种特征选择方法:L1-正则化逻辑回归(L1)、Remove and Retrain(ROAR)和因果特征选择。我们评估了特征选择方法是否能够保持 ID 性能(2008-2010 年)并提高 OOD 性能(2017-2019 年)。我们还评估了在 OOD 数据上重新训练的简约模型是否与在 OOD 年份组的所有特征上训练的 oracle 模型表现一样好。

结果

与 ID 性能相比,基线模型在长 LOS 和败血症任务上的 OOD 性能明显较差。L1 和 ROAR 保留了所有特征的 3.7%至 12.6%,而因果特征选择通常保留的特征较少。由 L1 和 ROAR 生成的模型表现出与基线模型相似的 ID 和 OOD 性能。使用从 2008-2010 年的训练数据中选择的特征对这些模型进行 2017-2019 年数据的重新训练,通常可以与直接使用所有可用特征对 2017-2019 年数据进行训练的 oracle 模型达到相同的效果。因果特征选择导致结果不一致,超集保持了 ID 性能,仅在长 LOS 任务上提高了 OOD 校准。

结论

虽然模型重新训练可以减轻 L1 和 ROAR 生成的简约模型受到时间数据集偏移的影响,但需要新的方法来主动提高时间鲁棒性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验