Suppr超能文献

用于理解医疗保健机器学习中标签泄漏的框架。

A framework for understanding label leakage in machine learning for health care.

机构信息

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, United States.

Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, United States.

出版信息

J Am Med Inform Assoc. 2023 Dec 22;31(1):274-280. doi: 10.1093/jamia/ocad178.

Abstract

INTRODUCTION

The pitfalls of label leakage, contamination of model input features with outcome information, are well established. Unfortunately, avoiding label leakage in clinical prediction models requires more nuance than the common advice of applying "no time machine rule."

FRAMEWORK

We provide a framework for contemplating whether and when model features pose leakage concerns by considering the cadence, perspective, and applicability of predictions. To ground these concepts, we use real-world clinical models to highlight examples of appropriate and inappropriate label leakage in practice.

RECOMMENDATIONS

Finally, we provide recommendations to support clinical and technical stakeholders as they evaluate the leakage tradeoffs associated with model design, development, and implementation decisions. By providing common language and dimensions to consider when designing models, we hope the clinical prediction community will be better prepared to develop statistically valid and clinically useful machine learning models.

摘要

简介

标签泄露的陷阱,即模型输入特征与结果信息的污染,已经得到充分证实。不幸的是,要避免临床预测模型中的标签泄露,需要比常见的“不使用时间机器规则”的建议更细致。

框架

我们通过考虑预测的节奏、视角和适用性,提供了一个框架来思考模型特征是否存在以及何时存在泄漏问题。为了说明这些概念,我们使用真实世界的临床模型来突出实践中适当和不适当的标签泄露的例子。

建议

最后,我们提供了一些建议,以支持临床和技术利益相关者在评估与模型设计、开发和实施决策相关的泄漏权衡时做出决策。通过为设计模型时需要考虑的内容提供通用语言和维度,我们希望临床预测社区能够更好地准备开发统计上有效和临床上有用的机器学习模型。

相似文献

2
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
8
Must-have Qualities of Clinical Research on Artificial Intelligence and Machine Learning.人工智能和机器学习临床研究的必备素质
Balkan Med J. 2023 Jan 23;40(1):3-12. doi: 10.4274/balkanmedj.galenos.2022.2022-11-51. Epub 2022 Dec 29.
10
Probabilistic Machine Learning for Healthcare.医疗保健中的概率机器学习。
Annu Rev Biomed Data Sci. 2021 Jul 20;4:393-415. doi: 10.1146/annurev-biodatasci-092820-033938. Epub 2021 Jun 1.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验