Pham Thai-Hoang, Zhang Xueru, Zhang Ping
ArXiv. 2024 May 10:arXiv:2405.06816v1.
Although recent advances in machine learning have shown its success to learn from independent and identically distributed (IID) data, it is vulnerable to out-of-distribution (OOD) data in an open world. Domain generalization (DG) deals with such an issue and it aims to learn a model from multiple source domains that can be generalized to unseen target domains. Existing studies on DG have largely focused on stationary settings with homogeneous source domains. However, in many applications, domains may evolve along a specific direction (e.g., time, space). Without accounting for such non-stationary patterns, models trained with existing methods may fail to generalize on OOD data. In this paper, we study domain generalization in non-stationary environment. We first examine the impact of environmental non-stationarity on model performance and establish the theoretical upper bounds for the model error at target domains. Then, we propose a novel algorithm based on adaptive invariant representation learning, which leverages the non-stationary pattern to train a model that attains good performance on target domains. Experiments on both synthetic and real data validate the proposed algorithm.
尽管机器学习的最新进展已表明其在从独立同分布(IID)数据进行学习方面取得了成功,但在开放世界中,它容易受到分布外(OOD)数据的影响。域泛化(DG)解决了此类问题,其目标是从多个源域学习一个可以泛化到未见目标域的模型。现有的关于DG的研究主要集中在具有同质源域的固定设置上。然而,在许多应用中,域可能会沿着特定方向(例如时间、空间)演变。如果不考虑这种非平稳模式,使用现有方法训练的模型可能无法在OOD数据上进行泛化。在本文中,我们研究非平稳环境下的域泛化。我们首先研究环境非平稳性对模型性能的影响,并建立目标域模型误差的理论上限。然后,我们提出了一种基于自适应不变表示学习的新颖算法,该算法利用非平稳模式来训练一个在目标域上具有良好性能的模型。在合成数据和真实数据上的实验验证了所提出的算法。