Department of Biostatistics, Virginia Commonwealth University, Richmond, Virginia, USA.
Department of Epidemiology, Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA.
Stat Med. 2024 Oct 15;43(23):4559-4574. doi: 10.1002/sim.10194. Epub 2024 Aug 13.
Increasingly, large, nationally representative health and behavioral surveys conducted under a multistage stratified sampling scheme collect high dimensional data with correlation structured along some domain (eg, wearable sensor data measured continuously and correlated over time, imaging data with spatiotemporal correlation) with the goal of associating these data with health outcomes. Analysis of this sort requires novel methodologic work at the intersection of survey statistics and functional data analysis. Here, we address this crucial gap in the literature by proposing an estimation and inferential framework for generalizable scalar-on-function regression models for data collected under a complex survey design. We propose to: (1) estimate functional regression coefficients using weighted score equations; and (2) perform inference using novel functional balanced repeated replication and survey-weighted bootstrap for multistage survey designs. This is the first frequentist study to discuss the estimation of scalar-on-function regression models in the context of complex survey studies and to assess the validity of various inferential techniques based on re-sampling methods via a comprehensive simulation study. We implement our methods to predict mortality using diurnal activity profiles measured via wearable accelerometers using the National Health and Nutrition Examination Survey 2003-2006 data. The proposed computationally efficient methods are implemented in R software package surveySoFR.
越来越多的大型、全国代表性的健康和行为调查采用多阶段分层抽样方案进行,收集具有相关性的高维数据,这些相关性沿着某些领域(例如,可穿戴传感器数据连续测量且随时间相关,具有时空相关性的成像数据)构建,目的是将这些数据与健康结果关联起来。这种分析需要在调查统计和功能数据分析的交叉点进行新的方法学工作。在这里,我们通过提出一种可推广的标量到函数回归模型的估计和推断框架来解决文献中的这一关键差距,该模型适用于在复杂调查设计下收集的数据。我们建议:(1)使用加权得分方程估计功能回归系数;(2)使用新的功能平衡重复复制和针对多阶段调查设计的调查加权引导进行推断。这是首次在复杂调查研究背景下讨论标量到函数回归模型的估计,并通过全面的模拟研究基于重采样方法评估各种推断技术的有效性的频率研究。我们使用 National Health and Nutrition Examination Survey 2003-2006 数据中的可穿戴加速度计测量的昼夜活动模式来预测死亡率,实施了我们的方法。所提出的计算效率高的方法在 R 软件包 surveySoFR 中实现。