• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过轻量级零阶近端梯度算法实现更低的查询复杂度

Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms.

作者信息

Gu Bin, Wei Xiyuan, Zhang Hualin, Chang Yi, Huang Heng

机构信息

School of Artificial Intelligence, Jilin University, Changchun, 130012 Jilin, China.

Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, U.A.E.

出版信息

Neural Comput. 2024 Apr 23;36(5):897-935. doi: 10.1162/neco_a_01636.

DOI:10.1162/neco_a_01636
PMID:38457756
Abstract

Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.

摘要

零阶(ZO)优化是机器学习问题中的一项关键技术,在这些问题中梯度计算成本高昂或无法进行。已经提出了几种方差减少的ZO近端算法来加速非光滑问题的ZO优化,并且在逼近真实梯度时,所有这些算法都选择了协调ZO估计器而非随机ZO估计器,因为前者更准确。虽然与协调ZO估计器相比,随机ZO估计器引入的误差更大且使收敛分析更具挑战性,但它仅需要O(1)计算,这明显少于协调ZO估计器的O(d)计算,其中d是问题空间的维度。为了利用随机ZO估计器计算效率高的特性,我们首先提出了一种零阶目标下降(ZOOD)性质,该性质可以在收敛速率的上界中纳入两种不同类型的误差。接下来,我们提出了两个用于ZO优化的通用约简框架,只要内部求解器的收敛速率满足ZOOD性质,这两个框架就可以分别自动推导出凸问题和非凸问题的收敛结果。通过将这两个约简框架应用于我们提出的ZOR-ProxSVRG和ZOR-ProxSAGA这两种具有完全随机ZO估计器的方差减少的ZO近端算法,对于非凸问题,我们将当前最优的函数查询复杂度从O(min(dn1/2,ε2,dε3))提高到了d>n12时的O(˜n + dε2),对于凸问题,从O(dε2)提高到了O(˜nlog(1/ε)+dε)。最后,我们进行实验以验证我们提出的方法的优越性。

相似文献

1
Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms.通过轻量级零阶近端梯度算法实现更低的查询复杂度
Neural Comput. 2024 Apr 23;36(5):897-935. doi: 10.1162/neco_a_01636.
2
Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity.具有更低函数查询复杂度的非凸零阶随机交替方向乘子法
IEEE Trans Pattern Anal Mach Intell. 2024 Aug 14;PP. doi: 10.1109/TPAMI.2023.3347082.
3
Hessian-Aided Random Perturbation (HARP) Using Noisy Zeroth-Order Oracles.使用噪声零阶预言机的黑森辅助随机扰动(HARP)
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3717-3726. doi: 10.1109/TNNLS.2021.3117999. Epub 2023 Jul 6.
4
Stochastic proximal gradient methods for nonconvex problems in Hilbert spaces.希尔伯特空间中非凸问题的随机近端梯度方法。
Comput Optim Appl. 2021;78(3):705-740. doi: 10.1007/s10589-020-00259-y. Epub 2021 Jan 12.
5
Nonconvex Sparse Regularization for Deep Neural Networks and Its Optimality.非凸稀疏正则化在深度神经网络中的应用及其最优性。
Neural Comput. 2022 Jan 14;34(2):476-517. doi: 10.1162/neco_a_01457.
6
OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS.稀疏非凸学习问题的最优计算与统计收敛速率
Ann Stat. 2014;42(6):2164-2201. doi: 10.1214/14-AOS1238.
7
Painless Stochastic Conjugate Gradient for Large-Scale Machine Learning.用于大规模机器学习的无痛随机共轭梯度法
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14645-14658. doi: 10.1109/TNNLS.2023.3280826. Epub 2024 Oct 7.
8
Scalable Proximal Jacobian Iteration Method With Global Convergence Analysis for Nonconvex Unconstrained Composite Optimizations.用于非凸无约束复合优化的具有全局收敛性分析的可扩展近端雅可比迭代方法
IEEE Trans Neural Netw Learn Syst. 2019 Sep;30(9):2825-2839. doi: 10.1109/TNNLS.2018.2885699. Epub 2019 Jan 15.
9
Dualityfree Methods for Stochastic Composition Optimization.随机组合优化的无对偶方法
IEEE Trans Neural Netw Learn Syst. 2019 Apr;30(4):1205-1217. doi: 10.1109/TNNLS.2018.2866699. Epub 2018 Sep 12.
10
Zeroth-order gradient tracking for decentralized learning with privacy guarantees.
ISA Trans. 2024 Sep;152:1-14. doi: 10.1016/j.isatra.2024.06.033. Epub 2024 Jul 3.