Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand.
PLoS One. 2019 Sep 9;14(9):e0220624. doi: 10.1371/journal.pone.0220624. eCollection 2019.
Due to the fast speed of data generation and collection from advanced equipment, the amount of data obviously overflows the limit of available memory space and causes difficulties achieving high learning accuracy. Several methods based on discard-after-learn concept have been proposed. Some methods were designed to cope with a single incoming datum but some were designed for a chunk of incoming data. Although the results of these approaches are rather impressive, most of them are based on temporally adding more neurons to learn new incoming data without any neuron merging process which can obviously increase the computational time and space complexities. Only online versatile elliptic basis function (VEBF) introduced neuron merging to reduce the space-time complexity of learning only a single incoming datum. This paper proposed a method for further enhancing the capability of discard-after-learn concept for streaming data-chunk environment in terms of low computational time and neural space complexities. A set of recursive functions for computing the relevant parameters of a new neuron, based on statistical confidence interval, was introduced. The newly proposed method, named streaming chunk incremental learning (SCIL), increases the plasticity and the adaptabilty of the network structure according to the distribution of incoming data and their classes. When being compared to the others in incremental-like manner, based on 11 benchmarked data sets of 150 to 581,012 samples with attributes ranging from 4 to 1,558 formed as streaming data, the proposed SCIL gave better accuracy and time in most data sets.
由于先进设备的数据生成和采集速度很快,数据量显然超过了可用内存空间的限制,导致难以实现高精度的学习。已经提出了几种基于学习后丢弃概念的方法。有些方法是为处理单个传入数据而设计的,而有些方法是为处理传入数据块而设计的。尽管这些方法的结果相当令人印象深刻,但它们大多数都是基于暂时添加更多神经元来学习新的传入数据,而没有任何神经元合并过程,这显然会增加计算时间和空间复杂度。只有在线通用椭圆基函数 (VEBF) 引入了神经元合并,以降低仅学习单个传入数据的时空复杂度。本文提出了一种方法,用于在低计算时间和神经空间复杂度的情况下,进一步增强流式数据块环境中学习后丢弃概念的能力。引入了一组基于统计置信区间的递归函数,用于计算新神经元的相关参数。新提出的方法名为流式数据块增量学习 (SCIL),根据传入数据及其类别的分布,增加了网络结构的可塑性和适应性。与其他以增量方式比较时,基于 11 个具有 150 到 581,012 个样本的基准数据集,这些数据集的属性范围从 4 到 1558,形成了流式数据,所提出的 SCIL 在大多数数据集中都提供了更好的准确性和时间。