Li Changsheng, Wei Fan, Dong Weishan, Wang Xiangfeng, Liu Qingshan, Zhang Xin
IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):323-336. doi: 10.1109/TPAMI.2018.2794446. Epub 2018 Jan 17.
Online multiple-output regression is an important machine learning technique for modeling, predicting, and compressing multi-dimensional correlated data streams. In this paper, we propose a novel online multiple-output regression method, called MORES, for streaming data. MORES can dynamically learn the structure of the regression coefficients to facilitate the model's continuous refinement. Considering that limited expressive ability of regression models often leading to residual errors being dependent, MORES intends to dynamically learn and leverage the structure of the residual errors to improve the prediction accuracy. Moreover, we introduce three modified covariance matrices to extract necessary information from all the seen data for training, and set different weights on samples so as to track the data streams' evolving characteristics. Furthermore, an efficient algorithm is designed to optimize the proposed objective function, and an efficient online eigenvalue decomposition algorithm is developed for the modified covariance matrix. Finally, we analyze the convergence of MORES in certain ideal condition. Experiments on two synthetic datasets and three real-world datasets validate the effectiveness and efficiency of MORES. In addition, MORES can process at least 2,000 instances per second (including training and testing) on the three real-world datasets, more than 12 times faster than the state-of-the-art online learning algorithm.
在线多输出回归是一种用于对多维相关数据流进行建模、预测和压缩的重要机器学习技术。在本文中,我们提出了一种新颖的用于流数据的在线多输出回归方法,称为MORES。MORES可以动态学习回归系数的结构,以促进模型的持续优化。考虑到回归模型有限的表达能力常常导致残差误差具有依赖性,MORES旨在动态学习并利用残差误差的结构来提高预测准确性。此外,我们引入了三个修正的协方差矩阵,以便从所有已见数据中提取用于训练的必要信息,并对样本设置不同的权重,从而跟踪数据流的演变特征。此外,设计了一种高效算法来优化所提出的目标函数,并为修正的协方差矩阵开发了一种高效的在线特征值分解算法。最后,我们分析了MORES在某些理想条件下的收敛性。在两个合成数据集和三个真实世界数据集上进行的实验验证了MORES的有效性和效率。此外,在三个真实世界数据集上,MORES每秒至少可以处理2000个实例(包括训练和测试),比最先进的在线学习算法快12倍以上。