Stojanov Petar, Gong Mingming, Carbonell Jaime G, Zhang Kun
Computer Science Department, Carnegie Mellon University.
University of Pittsburgh, Carnegie Mellon University.
Proc Mach Learn Res. 2019 Apr;89:3449-3458.
Covariate shift is a prevalent setting for supervised learning in the wild when the training and test data are drawn from different time periods, different but related domains, or via different sampling strategies. This paper addresses a transfer learning setting, with covariate shift between source and target domains. Most existing methods for correcting covariate shift exploit density ratios of the features to reweight the source-domain data, and when the features are high-dimensional, the estimated density ratios may suffer large estimation variances, leading to poor prediction performance. In this work, we investigate the dependence of covariate shift correction performance on the dimensionality of the features, and propose a correction method that finds a low-dimensional representation of the features, which takes into account feature relevant to the target , and exploits the density ratio of this representation for importance reweighting. We discuss the factors affecting the performance of our method and demonstrate its capabilities on both pseudo-real and real-world data.
协变量转移是一种普遍存在的场景,适用于在自然环境中的监督学习,此时训练数据和测试数据来自不同的时间段、不同但相关的领域,或通过不同的采样策略。本文探讨了一种迁移学习场景,其中源域和目标域之间存在协变量转移。大多数现有的校正协变量转移的方法利用特征的密度比来对源域数据重新加权,而当特征是高维时,估计的密度比可能会有很大的估计方差,从而导致预测性能较差。在这项工作中,我们研究了协变量转移校正性能对特征维度的依赖性,并提出了一种校正方法,该方法找到特征的低维表示,其中考虑了与目标相关的特征,并利用该表示的密度比进行重要性重新加权。我们讨论了影响我们方法性能的因素,并在伪真实数据和真实世界数据上展示了它的能力。