IEEE Trans Cybern. 2022 Nov;52(11):11373-11384. doi: 10.1109/TCYB.2021.3070420. Epub 2022 Oct 17.
In the context of streaming data, learning algorithms often need to confront several unique challenges, such as concept drift, label scarcity, and high dimensionality. Several concept drift-aware data stream learning algorithms have been proposed to tackle these issues over the past decades. However, most existing algorithms utilize a supervised learning framework and require all true class labels to update their models. Unfortunately, in the streaming environment, requiring all labels is unfeasible and not realistic in many real-world applications. Therefore, learning data streams with minimal labels is a more practical scenario. Considering the problem of the curse of dimensionality and label scarcity, in this article, we present a new semisupervised learning technique for streaming data. To cure the curse of dimensionality, we employ a denoising autoencoder to transform the high-dimensional feature space into a reduced, compact, and more informative feature representation. Furthermore, we use a cluster-and-label technique to reduce the dependency on true class labels. We employ a synchronization-based dynamic clustering technique to summarize the streaming data into a set of dynamic microclusters that are further used for classification. In addition, we employ a disagreement-based learning method to cope with concept drift. Extensive experiments performed on many real-world datasets demonstrate the superior performance of the proposed method compared to several state-of-the-art methods.
在流数据的背景下,学习算法通常需要应对几个独特的挑战,如概念漂移、标签稀缺和高维性。在过去的几十年中,已经提出了几种概念漂移感知的数据流学习算法来解决这些问题。然而,大多数现有的算法都利用监督学习框架,并要求所有真实的类标签来更新他们的模型。不幸的是,在流环境中,要求所有标签是不可行的,并且在许多实际应用中是不现实的。因此,使用最少的标签学习数据流是一个更实际的场景。考虑到维度和标签稀缺的诅咒问题,在本文中,我们提出了一种新的流数据半监督学习技术。为了解决维度的诅咒问题,我们采用了去噪自编码器将高维特征空间转换为降维、紧凑和更具信息量的特征表示。此外,我们使用聚类和标签技术来减少对真实类标签的依赖。我们采用基于同步的动态聚类技术将流数据总结为一组动态微聚类,进一步用于分类。此外,我们采用基于不一致的学习方法来应对概念漂移。在许多真实数据集上进行的大量实验表明,与几种最先进的方法相比,所提出的方法具有优越的性能。