Wunderlich Adam, Sklar Jack
Communications Technology Laboratory, National Institute of Standards and Technology, Boulder, CO 80305, United States of America.
Mach Learn Sci Technol. 2023 Sep;4(3). doi: 10.1088/2632-2153/acee44.
Random noise arising from physical processes is an inherent characteristic of measurements and a limiting factor for most signal processing and data analysis tasks. Given the recent interest in generative adversarial networks (GANs) for data-driven modeling, it is important to determine to what extent GANs can faithfully reproduce noise in target data sets. In this paper, we present an empirical investigation that aims to shed light on this issue for time series. Namely, we assess two general-purpose GANs for time series that are based on the popular deep convolutional GAN architecture, a direct time-series model and an image-based model that uses a short-time Fourier transform data representation. The GAN models are trained and quantitatively evaluated using distributions of simulated noise time series with known ground-truth parameters. Target time series distributions include a broad range of noise types commonly encountered in physical measurements, electronics, and communication systems: band-limited thermal noise, power law noise, shot noise, and impulsive noise. We find that GANs are capable of learning many noise types, although they predictably struggle when the GAN architecture is not well suited to some aspects of the noise, e.g. impulsive time-series with extreme outliers. Our findings provide insights into the capabilities and potential limitations of current approaches to time-series GANs and highlight areas for further research. In addition, our battery of tests provides a useful benchmark to aid the development of deep generative models for time series.
由物理过程产生的随机噪声是测量的固有特性,也是大多数信号处理和数据分析任务的限制因素。鉴于最近人们对用于数据驱动建模的生成对抗网络(GAN)感兴趣,确定GAN在何种程度上能够如实地再现目标数据集中的噪声非常重要。在本文中,我们进行了一项实证研究,旨在阐明时间序列方面的这个问题。具体而言,我们评估了两种基于流行的深度卷积GAN架构的通用时间序列GAN,一种是直接时间序列模型,另一种是使用短时傅里叶变换数据表示的基于图像的模型。使用具有已知真实参数的模拟噪声时间序列分布对GAN模型进行训练和定量评估。目标时间序列分布包括在物理测量、电子和通信系统中常见的广泛噪声类型:带限热噪声、幂律噪声、散粒噪声和脉冲噪声。我们发现GAN能够学习多种噪声类型,尽管当GAN架构不太适合噪声的某些方面时,例如具有极端异常值的脉冲时间序列,它们会出现可预见的困难。我们的研究结果深入了解了当前时间序列GAN方法的能力和潜在局限性,并突出了进一步研究的领域。此外,我们的一系列测试提供了一个有用的基准,以帮助开发用于时间序列的深度生成模型。