University of Picardie Jules Verne, Amiens, France; University of South Pacific, Suva, Fiji.
Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy.
Neural Netw. 2018 Jul;103:108-117. doi: 10.1016/j.neunet.2018.03.017. Epub 2018 Apr 6.
Big high dimensional data is becoming a challenging field of research. There exist a lot of techniques which infer information. However, because of the curse of dimensionality, a necessary step is the dimensionality reduction (DR) of the information. DR can be performed by linear and nonlinear algorithms. In general, linear algorithms are faster because of less computational burden. A related problem is dealing with time-varying high dimensional data, where the time dependence is due to nonstationary data distribution. Data stream algorithms are not able to project in lower dimensional spaces. Indeed, only linear projections, like principal component analysis (PCA), are used in real time while nonlinear techniques need the whole database (offline). The Growing Curvilinear Component Analysis (GCCA) neural network addresses this problem; it has a self-organized incremental architecture adapting to the changing data distribution and performs simultaneously the data quantization and projection by using CCA, a nonlinear distance-preserving reduction technique. This is achieved by introducing the idea of "seed", pair of neurons which colonize the input domain, and "bridge", a novel kind of edge in the manifold graph, which signals the data non-stationarity. Some artificial examples and a real application are given, with a comparison with other existing techniques.
高维大数据正成为一个具有挑战性的研究领域。有许多技术可以推断信息。然而,由于维度的诅咒,必要的步骤是信息的降维(DR)。DR 可以通过线性和非线性算法来完成。一般来说,由于计算负担较小,线性算法更快。一个相关的问题是处理时变高维数据,其中时间相关性是由于非平稳数据分布引起的。数据流算法无法在较低维空间中进行投影。实际上,只有线性投影,如主成分分析(PCA),在实时应用中使用,而非线性技术需要整个数据库(离线)。增长曲线成分分析(GCCA)神经网络解决了这个问题;它具有自组织的增量架构,适用于不断变化的数据分布,并通过使用 CCA 同时执行数据量化和投影,CCA 是一种保留非线性距离的降维技术。这是通过引入“种子”的概念来实现的,种子是一对神经元,它们开拓输入域,以及“桥梁”,流形图中的一种新型边缘,它表示数据的非平稳性。给出了一些人工示例和一个实际应用,并与其他现有技术进行了比较。