Kim Younghoon, Basu Sumanta, Banerjee Samprit
Department of Statistics and Data Science, Cornell University, Ithaca, NY, USA.
Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
Stat Med. 2025 May;44(10-12):e70099. doi: 10.1002/sim.70099.
We develop a data-driven cosegmentation algorithm of passively sensed and self-reported active variables collected through smartphones to identify emotionally stressful states in middle-aged and older patients with mood disorders undergoing therapy, some of whom also have chronic pain. Our method leverages the association between the different types of time series. These data are typically nonstationary, with meaningful associations often occurring only over short time windows. Traditional machine learning (ML) methods, when applied globally on the entire time series, often fail to capture these time-varying local patterns. Our approach first segments the passive sensing variables by detecting their change points, then examines segment-specific associations with the active variable to identify cosegmented periods that exhibit distinct relationships between stress and passively sensed measures. We then use these periods to predict future emotional stress states using standard ML methods. By shifting the unit of analysis from individual time points to data-driven segments of time and allowing for different associations in different segments, our algorithm helps detect patterns that only exist within short-time windows. We apply our method to detect periods of stress in patient data collected during ALACRITY Phase I study. Our findings indicate that the data-driven segmentation algorithm identifies stress periods more accurately than traditional ML methods that do not incorporate segmentation.
我们开发了一种数据驱动的共分割算法,用于处理通过智能手机收集的被动感知和自我报告的主动变量,以识别接受治疗的中年及老年情绪障碍患者的情绪应激状态,其中一些患者还患有慢性疼痛。我们的方法利用了不同类型时间序列之间的关联。这些数据通常是非平稳的,有意义的关联往往只在短时间窗口内出现。传统的机器学习(ML)方法在对整个时间序列进行全局应用时,往往无法捕捉这些随时间变化的局部模式。我们的方法首先通过检测被动感知变量的变化点对其进行分割,然后检查与主动变量的特定段关联,以识别在压力与被动感知测量之间呈现不同关系的共分割时段。然后,我们使用这些时段,通过标准的ML方法预测未来的情绪应激状态。通过将分析单位从单个时间点转移到数据驱动的时间段,并允许不同段中有不同的关联,我们的算法有助于检测仅在短时间窗口内存在的模式。我们将我们的方法应用于检测在ALACRITY一期研究期间收集的患者数据中的应激时段。我们的研究结果表明,数据驱动的分割算法比未纳入分割的传统ML方法更准确地识别应激时段。