Wang Cong, Xu Shengyuan, Yuan Deming
IEEE Trans Cybern. 2024 Jan;54(1):63-75. doi: 10.1109/TCYB.2022.3177644. Epub 2023 Dec 20.
This article studies the distributed online stochastic convex optimization problem with the time-varying constraint over a multiagent system constructed by various agents. The sequences of cost functions and constraint functions, both of which have dynamic parameters following time-varying distributions, are unacquainted to the agent ahead of time. Agents in the network are able to interact with their neighbors through a sequence of strongly connected and time-varying graphs. We develop the adaptive distributed bandit primal-dual algorithm whose step size and regularization sequences are adaptive and have no prior knowledge about the total iteration span T . The adaptive distributed bandit primal-dual algorithm applies bandit feedback with a one-point or two-point gradient estimator to evaluate gradient values. It is illustrated in this article that if the drift of the benchmark sequence is sublinear, then the adaptive distributed bandit primal-dual algorithm exhibits sublinear expected dynamic regret and constraint violation using both two kinds of gradient estimator to compute gradient information. We present a numerical experiment to show the performance of the proposed method.
本文研究了由多个智能体构建的多智能体系统上具有时变约束的分布式在线随机凸优化问题。代价函数序列和约束函数序列都具有遵循时变分布的动态参数,智能体事先并不知晓这些序列。网络中的智能体能够通过一系列强连通且时变的图与邻居进行交互。我们开发了自适应分布式带反馈原始对偶算法,其步长和正则化序列是自适应的,并且对总迭代次数T没有先验知识。自适应分布式带反馈原始对偶算法应用带反馈的单点或两点梯度估计器来评估梯度值。本文表明,如果基准序列的漂移是次线性的,那么使用这两种梯度估计器来计算梯度信息时,自适应分布式带反馈原始对偶算法会表现出次线性的期望动态遗憾和约束违反。我们给出了一个数值实验来展示所提方法的性能。