Saengkyongam Sorawit, Pfister Niklas, Klasnja Predrag, Murphy Susan, Peters Jonas
Seminar for Statistics, ETH Zürich, Zürich, Switzerland.
Department of Mathematical Sciences, University of Copenhagena, Copenhagen, Denmark.
J Mach Learn Res. 2024;25.
Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach.
策略学习是许多现实世界学习系统的重要组成部分。策略学习中的一个主要挑战是如何有效地适应未知环境或任务。最近,有人建议利用不变条件分布来学习能更好地推广到未知环境的模型。然而,假设整个条件分布的不变性(我们称之为完全不变性)在实践中可能是一个过于强的假设。在本文中,我们引入了一种对完全不变性的松弛,称为效应不变性(简称为e不变性),并证明在适当假设下,它足以实现零样本策略泛化。我们还讨论了一种扩展,即在我们有来自测试环境的小样本时利用e不变性,实现少样本策略泛化。我们的工作不假设潜在的因果图,也不假设数据由结构因果模型生成;相反,我们开发了直接从数据测试e不变性的测试程序。我们使用模拟数据和移动健康干预数据集展示了实证结果,以证明我们方法的有效性。