Suppr超能文献

基于核最大相关熵准则的随机梯度下降

Stochastic Gradient Descent for Kernel-Based Maximum Correntropy Criterion.

作者信息

Li Tiankai, Wang Baobin, Peng Chaoquan, Yin Hong

机构信息

School of Mathematics and Statistics, South-Central MinZu University, Wuhan 430074, China.

School of Mathematics, Renmin University of China, Beijing 100872, China.

出版信息

Entropy (Basel). 2024 Dec 17;26(12):1104. doi: 10.3390/e26121104.

Abstract

Maximum correntropy criterion (MCC) has been an important method in machine learning and signal processing communities since it was successfully applied in various non-Gaussian noise scenarios. In comparison with the classical least squares method (LS), which takes only the second-order moment of models into consideration and belongs to the convex optimization problem, MCC captures the high-order information of models that play crucial roles in robust learning, which is usually accompanied by solving the non-convexity optimization problems. As we know, the theoretical research on convex optimizations has made significant achievements, while theoretical understandings of non-convex optimization are still far from mature. Motivated by the popularity of the stochastic gradient descent (SGD) for solving nonconvex problems, this paper considers SGD applied to the kernel version of MCC, which has been shown to be robust to outliers and non-Gaussian data in nonlinear structure models. As the existing theoretical results for the SGD algorithm applied to the kernel MCC are not well established, we present the rigorous analysis for the convergence behaviors and provide explicit convergence rates under some standard conditions. Our work can fill the gap between optimization process and convergence during the iterations: the iterates need to converge to the global minimizer while the obtained estimator cannot ensure the global optimality in the learning process.

摘要

自最大互信息准则(MCC)成功应用于各种非高斯噪声场景以来,它一直是机器学习和信号处理领域的一种重要方法。与仅考虑模型二阶矩且属于凸优化问题的经典最小二乘法(LS)相比,MCC能够捕捉在鲁棒学习中起关键作用的模型高阶信息,而这通常伴随着求解非凸优化问题。众所周知,凸优化的理论研究已取得显著成果,而非凸优化的理论理解仍远未成熟。受用于解决非凸问题的随机梯度下降(SGD)方法广泛应用的启发,本文考虑将SGD应用于MCC的核版本,该版本在非线性结构模型中已被证明对异常值和非高斯数据具有鲁棒性。由于应用于核MCC的SGD算法的现有理论结果尚未完善,我们对其收敛行为进行了严格分析,并在一些标准条件下给出了明确的收敛速率。我们的工作可以填补迭代过程中优化过程与收敛之间的差距:迭代需要收敛到全局极小值点,而在学习过程中得到的估计器不能保证全局最优性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a24d/11675914/bb26c0bc004d/entropy-26-01104-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验