Suppr超能文献

队列选择会影响基于临床数据的机器学习吗?

Does Cohort Selection Affect Machine Learning from Clinical Data?

作者信息

Haghighathoseini Atefehsadat, Wojtusiak Janusz, Min Hua, Leslie Timothy, Frankenfeld Cara, Menon Nirup M

机构信息

George Mason University, Fairfax, VA, USA.

MaineHealth Institute for Research, Scarborough, ME, USA.

出版信息

AMIA Annu Symp Proc. 2025 May 22;2024:473-482. eCollection 2024.

Abstract

This study investigates cohort selection and its effects on the quality of machine learning (ML) models trained on clinical data, focusing on measurements taken within the first 48 hours of hospital admission. It discusses the potential repercussions of making arbitrary decisions during data processing prior to applying ML methods. Experiments are performed within the framework of the National COVID Cohort Collaborative (N3C) dataset. The research aims to unravel biases and assess the fairness of machine learning models used to predict outcomes for hospitalized patients. Detailed discussions cover the data, decision-making processes, and the resulting impact on model predictions regarding patient outcomes. An experiment is conducted in which four arbitrary decisions are made, resulting in 16 distinct datasets characterized by varying sizes and properties. The findings demonstrate significant differences in the obtained datasets and indicate a high potential for bias based on inclusion or exclusion decisions. The results also confirm significant differences in the performance of models constructed on different cohorts, especially when cross-compared between ones based on different inclusion criteria. The study specifically chose to analyze gender, race, and ethnicity as these social determinants of health played a significant role in COVID-19 outcomes.

摘要

本研究调查了队列选择及其对基于临床数据训练的机器学习(ML)模型质量的影响,重点关注入院后48小时内进行的测量。它讨论了在应用ML方法之前的数据处理过程中做出任意决策的潜在影响。实验在国家COVID队列协作(N3C)数据集的框架内进行。该研究旨在揭示偏差并评估用于预测住院患者预后的机器学习模型的公平性。详细讨论涵盖了数据、决策过程以及对患者预后模型预测的最终影响。进行了一项实验,其中做出了四个任意决策,产生了16个不同的数据集,其特征在于大小和属性各不相同。研究结果表明,所获得的数据集存在显著差异,并表明基于纳入或排除决策存在高度的偏差可能性。结果还证实了在不同队列上构建的模型性能存在显著差异,特别是在基于不同纳入标准的模型之间进行交叉比较时。该研究特别选择分析性别、种族和民族,因为这些健康的社会决定因素在COVID-19的结果中发挥了重要作用。

相似文献

本文引用的文献

1
Suggestion of statistical validation on feature importance of machine learning.关于机器学习特征重要性的统计验证建议。
Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1-4. doi: 10.1109/EMBC40787.2023.10340208.
4
Mitigating bias in machine learning for medicine.减轻医学机器学习中的偏差。
Commun Med (Lond). 2021 Aug 23;1:25. doi: 10.1038/s43856-021-00028-w.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验