Suppr超能文献

分布偏移下的多源共形推理

Multi-Source Conformal Inference Under Distribution Shift.

作者信息

Liu Yi, Levis Alexander W, Normand Sharon-Lise, Han Larry

机构信息

North Carolina State University, Department of Statistics, Raleigh, NC, USA.

Carnegie Mellon University, Department of Statistics, Pittsburgh, PA, USA.

出版信息

Proc Mach Learn Res. 2024 Jul;235:31344-31382.

Abstract

Recent years have experienced increasing utilization of complex machine learning models across multiple sources of data to inform more generalizable decision-making. However, distribution shifts across data sources and privacy concerns related to sharing individual-level data, coupled with a lack of uncertainty quantification from machine learning predictions, make it challenging to achieve valid inferences in multi-source environments. In this paper, we consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources. We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations, and show that one can incorporate machine learning prediction algorithms in the estimation of nuisance functions while still achieving parametric rates of convergence to nominal coverage probabilities. Moreover, when conditional outcome invariance is violated, we propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction. We highlight the robustness and efficiency of our proposals for a variety of conformal scores and data-generating mechanisms via extensive synthetic experiments. Hospital length of stay prediction intervals for pediatric patients undergoing a high-risk cardiac surgical procedure between 2016-2022 in the U.S. illustrate the utility of our methodology.

摘要

近年来,复杂的机器学习模型在多个数据源中的应用越来越广泛,以支持更具普遍性的决策。然而,数据源之间的分布变化以及与共享个体层面数据相关的隐私问题,再加上机器学习预测缺乏不确定性量化,使得在多源环境中进行有效的推断具有挑战性。在本文中,我们考虑利用多个潜在有偏差的数据源为目标人群获得无分布预测区间的问题。我们推导了目标人群和源人群中未观察到的结果分位数的有效影响函数,并表明可以在干扰函数的估计中纳入机器学习预测算法,同时仍能达到名义覆盖概率的参数收敛速率。此外,当条件结果不变性被违反时,我们提出一种数据自适应策略,对信息丰富的数据源进行加权以提高效率,对信息不足的数据源进行降权以减少偏差。我们通过广泛的综合实验突出了我们的提议对于各种共形分数和数据生成机制的稳健性和效率。2016 - 2022年美国接受高风险心脏手术的儿科患者的住院时间预测区间说明了我们方法的实用性。

相似文献

2
Doubly robust calibration of prediction sets under covariate shift.协变量偏移下预测集的双重稳健校准
J R Stat Soc Series B Stat Methodol. 2024 Mar 4;86(4):943-965. doi: 10.1093/jrsssb/qkae009. eCollection 2024 Sep.
6
Prediction sets adaptive to unknown covariate shift.适应未知协变量转移的预测集
J R Stat Soc Series B Stat Methodol. 2023 Jul 17;85(5):1680-1705. doi: 10.1093/jrsssb/qkad069. eCollection 2023 Nov.
9
Collaborative double robust targeted maximum likelihood estimation.协作双稳健靶向最大似然估计
Int J Biostat. 2010 May 17;6(1):Article 17. doi: 10.2202/1557-4679.1181.

本文引用的文献

2
Doubly robust calibration of prediction sets under covariate shift.协变量偏移下预测集的双重稳健校准
J R Stat Soc Series B Stat Methodol. 2024 Mar 4;86(4):943-965. doi: 10.1093/jrsssb/qkae009. eCollection 2024 Sep.
3
Federated causal inference in heterogeneous observational data.基于异质观测数据的联邦因果推断。
Stat Med. 2023 Oct 30;42(24):4418-4439. doi: 10.1002/sim.9868. Epub 2023 Aug 8.
5
A fast score test for generalized mixture models.广义混合模型的快速得分检验。
Biometrics. 2020 Sep;76(3):811-820. doi: 10.1111/biom.13204. Epub 2019 Dec 31.
10
Super learner.超级学习者。
Stat Appl Genet Mol Biol. 2007;6:Article25. doi: 10.2202/1544-6115.1309. Epub 2007 Sep 16.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验