Yang James J, Buu Anne
Department of Biostatistics and Data Science, University of Texas Health Science Center, Houston, Texas, U.S.A.
Department of Health Promotion and Behavioral Sciences, University of Texas Health Science Center. Houston, Texas, U.S.A.
Data Sci Sci. 2024;3(1). doi: 10.1080/26941899.2024.2383770. Epub 2024 Aug 2.
The Singular Spectrum Analysis (SSA) is a useful tool for extracting signals from noisy time series. However, the structural insights provided by SSA are significantly influenced by the choice of window length. While the conventional approach, recommending a larger window length, excels with short to moderately-sized time series, it becomes computationally burdensome for longer time series, potentially amplifying mean squared reconstruction errors. This study addresses this methodological gap by introducing an adaptive sequential SSA method that iteratively selects an optimal window length for efficient extraction of essential eigen-sequences (signals) with minimal reconstruction error. This proposed method is versatile, catering to both short-moderate and lengthy time series. Simulation studies demonstrate its efficacy in scenarios where observed data stem from the sum of two sinusoidal functions and noise. Real data analysis on 6-day heart rate data from a young adult e-cigarette user reveals a distinct clustering of vaping events in the scatter plot of the first and third eigen-sequences, indicating the potential of developing "digital biomarkers" for vaping behavior based on extracted eigen-sequences in future studies. In conclusion, the adaptive sequential SSA method offers a robust and flexible approach for signal extraction in diverse time series applications.
奇异谱分析(SSA)是从含噪时间序列中提取信号的一种有用工具。然而,SSA提供的结构见解会受到窗口长度选择的显著影响。传统方法推荐使用较大的窗口长度,对于短到中等长度的时间序列表现出色,但对于较长的时间序列计算量会变得很大,可能会放大均方重构误差。本研究通过引入一种自适应序列SSA方法来解决这一方法学上的差距,该方法迭代选择最优窗口长度,以便以最小的重构误差高效提取基本特征序列(信号)。所提出的方法具有通用性,适用于短到中等长度以及长的时间序列。模拟研究证明了其在观测数据源于两个正弦函数与噪声之和的场景中的有效性。对一名年轻成年电子烟使用者6天心率数据的实际数据分析显示,在第一和第三特征序列的散点图中, vaping事件有明显的聚类,这表明在未来研究中基于提取的特征序列开发 vaping行为的“数字生物标志物”具有潜力。总之,自适应序列SSA方法为不同时间序列应用中的信号提取提供了一种稳健且灵活的方法。