IEEE Trans Neural Netw Learn Syst. 2015 Oct;26(10):2464-76. doi: 10.1109/TNNLS.2014.2387313. Epub 2015 Jan 21.
Traditional online kernel learning analysis assumes independently identically distributed (i.i.d.) about the training sequence. Recent studies reveal that when the loss function is smooth and strongly convex, given T i.i.d. training instances, a constant sampling complexity of random Fourier features is sufficient to ensure O(logT/T) convergence rate of excess risk, which is optimal in online kernel learning up to a logT factor. However, the i.i.d. hypothesis is too strong in practice, which greatly impairs their value. In this paper, we study the sampling complexity of random Fourier features in online kernel learning under non-i.i.d. assumptions. We prove that the sampling complexity under non-i.i.d. settings is also constant, but the convergence rate of excess risk is O(logT/T+ ϕ) , where ϕ is the mixing coefficient measuring the extent of non-i.i.d. of training sequence. We conduct experiments both on artificial and real large-scale data sets to verify our theories.
传统的在线核学习分析假设训练序列是独立同分布的(i.i.d.)。最近的研究表明,当损失函数是平滑且强凸的时,对于 T 个独立同分布的训练实例,随机傅里叶特征的固定采样复杂度足以保证过剩风险的 O(logT/T)收敛率,这在在线核学习中是最优的,至多有一个 logT 因子的差距。然而,在实践中,独立同分布的假设过于严格,这极大地降低了它的价值。在本文中,我们研究了在线核学习中非独立同分布假设下随机傅里叶特征的采样复杂度。我们证明了在非独立同分布设置下的采样复杂度也是固定的,但过剩风险的收敛率是 O(logT/T+ ϕ),其中 ϕ 是衡量训练序列非独立同分布程度的混合系数。我们在人工和真实的大规模数据集上进行了实验,以验证我们的理论。