Wang Cong, Xu Shengyuan, Yuan Deming, Zhang Baoyong, Zhang Zhengqiang
IEEE Trans Cybern. 2022 Apr;52(4):2263-2273. doi: 10.1109/TCYB.2020.2999309. Epub 2022 Apr 5.
In this article, we concentrate on distributed online convex optimization problems over multiagent systems, where the communication between nodes is represented by a class of directed graphs that are time varying and uniformly strongly connected. This problem is in bandit feedback, in the sense that at each time only the cost function value at the committed point is revealed to each node. Then, nodes update their decisions by exchanging information with their neighbors only. To deal with Lipschitz continuous and strongly convex cost functions, a distributed online convex optimization algorithm that achieves sublinear individual regret for every node is developed. The algorithm is built on the algorithm called the push-sum scheme that releases the request of doubly stochastic weight matrices, and the one-point gradient estimator that requires the function value at only one point at every iteration, instead of the gradient information of loss function. The expected regret of our proposed algorithm scales as O (T ln(T)) , and T is the number of iterations. To validate the performance of the algorithm developed in this article, we give a simulation of a common numerical example.
在本文中,我们专注于多智能体系统上的分布式在线凸优化问题,其中节点之间的通信由一类时变且一致强连通的有向图表示。该问题处于带反馈的情况,即每次仅向每个节点揭示在已提交点处的成本函数值。然后,节点仅通过与邻居交换信息来更新其决策。为了处理利普希茨连续且强凸的成本函数,我们开发了一种分布式在线凸优化算法,该算法为每个节点实现了次线性个体遗憾值。该算法基于称为推和方案的算法构建,该方案放宽了对双随机权重矩阵的要求,以及单点梯度估计器,该估计器在每次迭代时仅需要一个点处的函数值,而不是损失函数的梯度信息。我们提出的算法的期望遗憾值按(O (T \ln(T)))缩放,其中(T)是迭代次数。为了验证本文中开发的算法的性能,我们给出了一个常见数值示例的模拟。