Chalongvorachai Thasorn, Woraratpanya Kuntpong
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Department and Organization, 1 Chalong Krung 1 Alley, Lat Krabang, 10520, Bangkok, Thailand.
Heliyon. 2021 Jul 30;7(8):e07687. doi: 10.1016/j.heliyon.2021.e07687. eCollection 2021 Aug.
Unlike data augmentation, data generation for extremely rare cases is an approach that can spawn a significant number of high-quality samples based on very few original data. This could be useful in anomaly detection and classification tasks that have the limitation of publicly available datasets for research purposes. Though some other approaches have attempted to solve this problem, such as data augmentation techniques, there was nothing to ensure the characteristics of synthesized samples. Previously, we initiated a framework, called Data Augmentation and Generation for Anomalous Time-series Signals (DAGAT), that was in cooperation with important components: Data Augmentation, Variational Autoencoder (VAE), Data Picker (DP), Signal Fragment Assembler (SFA), and Quality Classifier (QC). And then, an upgraded framework, called An Advanced Data Generation for Anomalous Signals (ADGAS), was introduced to eliminate the limitations of DAGAT; those are uncontrollable outputs and the possibility of bad data included in a training set. By reforming DAGAT architecture, ADGAS achieves a better outcome of generated samples. Nonetheless, ADGAS could be improved through better SFA, DP, and QC. Hence, this paper proposed a Data Generation Framework for Extremely Rare Case Signals. The proposed framework is achievable in generating reliable data for various objectives. We challenged this framework by using the 1D-CNN to serve as the performance evaluator in multi-class anomalous classifications and using the water treatment and water distribution testbed (SWaT and WADI) as the real-world anomaly datasets. The result shows that it surpasses other baseline methods of anomaly data augmentation and data generation techniques.
与数据增强不同,针对极其罕见病例的数据生成是一种基于极少原始数据就能生成大量高质量样本的方法。这对于存在研究用途公开可用数据集限制的异常检测和分类任务可能很有用。尽管其他一些方法试图解决这个问题,比如数据增强技术,但没有什么能确保合成样本的特征。此前,我们发起了一个名为“异常时间序列信号的数据增强与生成”(DAGAT)的框架,它与重要组件协同工作:数据增强、变分自编码器(VAE)、数据选择器(DP)、信号片段组装器(SFA)和质量分类器(QC)。然后,引入了一个名为“异常信号的高级数据生成”(ADGAS)的升级框架,以消除DAGAT的局限性,即不可控的输出以及训练集中包含不良数据的可能性。通过改革DAGAT架构,ADGAS在生成样本方面取得了更好的结果。尽管如此,ADGAS可以通过更好的SFA、DP和QC得到改进。因此,本文提出了一种针对极其罕见病例信号的数据生成框架。所提出的框架能够为各种目标生成可靠的数据。我们通过使用一维卷积神经网络(1D-CNN)作为多类异常分类中的性能评估器,并使用水处理和配水测试平台(SWaT和WADI)作为真实世界的异常数据集,对这个框架进行了测试。结果表明,它超越了异常数据增强和数据生成技术的其他基线方法。