分布锚定回归

Distributional anchor regression.

作者信息

Kook Lucas, Sick Beate, Bühlmann Peter

机构信息

Epidemiology, Biostatistics and Prevention Institute, University of Zurich, 8001 Zurich, Switzerland.

Institute of Data Analysis and Process Design, Zurich University of Applied Sciences, 8400 Winterthur, Switzerland.

出版信息

Stat Comput. 2022;32(3):39. doi: 10.1007/s11222-022-10097-z. Epub 2022 May 13.

DOI:10.1007/s11222-022-10097-z

PMID:35582000

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9106647/

Abstract

Prediction models often fail if train and test data do not stem from the same distribution. Out-of-distribution (OOD) generalization to unseen, perturbed test data is a desirable but difficult-to-achieve property for prediction models and in general requires strong assumptions on the data generating process (DGP). In a causally inspired perspective on OOD generalization, the test data arise from a specific class of interventions on exogenous random variables of the DGP, called anchors. Anchor regression models, introduced by Rothenhäusler et al. (J R Stat Soc Ser B 83(2):215-246, 2021. 10.1111/rssb.12398), protect against distributional shifts in the test data by employing causal regularization. However, so far anchor regression has only been used with a squared-error loss which is inapplicable to common responses such as censored continuous or ordinal data. Here, we propose a distributional version of anchor regression which generalizes the method to potentially censored responses with at least an ordered sample space. To this end, we combine a flexible class of parametric transformation models for distributional regression with an appropriate causal regularizer under a more general notion of residuals. In an exemplary application and several simulation scenarios we demonstrate the extent to which OOD generalization is possible.

摘要

如果训练数据和测试数据并非来自相同分布，预测模型往往会失效。对于预测模型而言，向未见的、经过扰动的测试数据进行分布外（OOD）泛化是一项理想但难以实现的特性，并且通常需要对数据生成过程（DGP）做出强有力的假设。从因果启发的角度来看OOD泛化，测试数据源于对DGP的外生随机变量的一类特定干预，称为锚点。Rothenhäusler等人（《皇家统计学会会刊B辑》83(2):215 - 246, 2021. 10.1111/rssb.12398）引入的锚点回归模型，通过采用因果正则化来防范测试数据中的分布偏移。然而，到目前为止，锚点回归仅与平方误差损失一起使用，这不适用于诸如删失连续或有序数据等常见响应。在此，我们提出一种分布版本的锚点回归，将该方法推广到至少具有有序样本空间的潜在删失响应。为此，我们在更一般的残差概念下，将一类灵活的用于分布回归的参数变换模型与适当的因果正则化相结合。在一个示例性应用和几个模拟场景中，我们展示了OOD泛化可能达到的程度。