Gu Bin, Bao Runxue, Zhang Chenkang, Huang Heng
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17099-17110. doi: 10.1109/TNNLS.2023.3299756. Epub 2024 Dec 2.
Pairwise learning is an important machine-learning topic with many practical applications. An online algorithm is the first choice for processing streaming data and is preferred for handling large-scale pairwise learning problems. However, existing online pairwise learning algorithms are not scalable and efficient enough for large-scale high-dimensional data, because they were designed based on singly stochastic gradients. To address this challenging problem, in this article, we propose a dynamic doubly stochastic gradient algorithm (D2SG) for online pairwise learning. Especially, only the time and space complexities of are needed for incorporating a new sample, where is the dimensionality of data. This means that our D2SG is much faster and more scalable than the existing online pairwise learning algorithms while the statistical accuracy can be guaranteed through our rigorous theoretical analysis under standard assumptions. The experimental results on a variety of real-world datasets not only confirm the theoretical result of our new D2SG algorithm, but also show that D2SG has better efficiency and scalability than the existing online pairwise learning algorithms.
成对学习是一个重要的机器学习主题,有许多实际应用。在线算法是处理流数据的首选,并且在处理大规模成对学习问题时更受青睐。然而,现有的在线成对学习算法对于大规模高维数据的扩展性和效率不够高,因为它们是基于单随机梯度设计的。为了解决这个具有挑战性的问题,在本文中,我们提出了一种用于在线成对学习的动态双随机梯度算法(D2SG)。特别地,纳入一个新样本仅需要 的时间和空间复杂度,其中 是数据的维度。这意味着我们的D2SG比现有的在线成对学习算法更快且扩展性更强,同时在标准假设下通过我们严格的理论分析可以保证统计准确性。在各种真实世界数据集上的实验结果不仅证实了我们新的D2SG算法的理论结果,还表明D2SG比现有的在线成对学习算法具有更好的效率和扩展性。