NTT Communication Science Laboratories, NTT Corporation, Seika-cho, Kyoto, 619-0237, Japan.
Neural Comput. 2013 May;25(5):1324-70. doi: 10.1162/NECO_a_00442.
Divergence estimators based on direct approximation of density ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test. However, since density-ratio functions often possess high fluctuation, divergence estimation is a challenging task in practice. In this letter, we use relative divergences for distribution comparison, which involves approximation of relative density ratios. Since relative density ratios are always smoother than corresponding ordinary density ratios, our proposed method is favorable in terms of nonparametric convergence speed. Furthermore, we show that the proposed divergence estimator has asymptotic variance independent of the model complexity under a parametric setup, implying that the proposed estimator hardly overfits even with complex models. Through experiments, we demonstrate the usefulness of the proposed approach.
基于直接逼近密度比而无需分别逼近分子和分母密度的散度估计器已成功应用于涉及分布比较的机器学习任务,例如异常值检测、迁移学习和两样本同质性检验。然而,由于密度比函数通常具有较高的波动性,因此在实践中进行散度估计是一项具有挑战性的任务。在本信函中,我们使用相对散度进行分布比较,这涉及相对密度比的逼近。由于相对密度比总是比相应的普通密度比更平滑,因此我们提出的方法在非参数收敛速度方面具有优势。此外,我们表明,在参数设置下,所提出的散度估计器具有与模型复杂度无关的渐近方差,这意味着即使使用复杂的模型,该估计器也几乎不会过度拟合。通过实验,我们证明了所提出方法的有用性。