Liu Yanfang, Fan Xiaocong, Li Wenbin, Gao Yang
IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):6725-6739. doi: 10.1109/TNNLS.2022.3178880. Epub 2023 Oct 5.
The idea of combining the active query strategy and the passive-aggressive (PA) update strategy in online learning can be credited to the PA active (PAA) algorithm, which has proven to be effective in learning linear classifiers from datasets with a fixed feature space. We propose a novel family of online active learning algorithms, named PAA learning for trapezoidal data streams (PAATS) and multiclass PAATS (MPAATS) (and their variants), for binary and multiclass online classification tasks on trapezoidal data streams where the feature space may expand over time. Under the context of an ever-changing feature space, we provide the theoretical analysis of the mistake bounds for both PAATS and MPAATS. Our experiments on a wide variety of benchmark datasets have confirm that the combination of the instance-regulated active query strategy and the PA update strategy is much more effective in learning from trapezoidal data streams. We have also compared PAATS with online learning with streaming features (OLSF)-the state-of-the-art approach in learning linear classifiers from trapezoidal data streams. PAATS could achieve much better classification accuracy, especially for large-scale real-world data streams.
在线学习中结合主动查询策略和被动攻击(PA)更新策略的想法可归功于PA主动(PAA)算法,该算法已被证明在从具有固定特征空间的数据集中学习线性分类器方面是有效的。我们提出了一类新颖的在线主动学习算法,称为梯形数据流的PAA学习(PAATS)和多类PAATS(MPAATS)(及其变体),用于梯形数据流上的二分类和多分类在线分类任务,其中特征空间可能随时间扩展。在不断变化的特征空间背景下,我们对PAATS和MPAATS的错误界限进行了理论分析。我们在各种基准数据集上的实验证实,实例调节主动查询策略和PA更新策略的结合在从梯形数据流学习中要有效得多。我们还将PAATS与具有流特征的在线学习(OLSF)进行了比较——这是从梯形数据流学习线性分类器的最先进方法。PAATS可以实现更好的分类准确率,特别是对于大规模的真实世界数据流。