Loewinger Gabriel, Levis Alexander W, Cui Erjia, Pereira Francisco
Machine Learning Core, National Institute of Mental Health.
Department of Statistics & Data Science, Carnegie Mellon University.
ArXiv. 2025 Jun 25:arXiv:2506.20437v1.
Longitudinal binary or count functional data are common in neuroscience, but are often too large to analyze with existing functional regression methods. We propose one-step penalized generalized estimating equations that supports continuous, count, or binary functional outcomes and is fast even when datasets have a large number of clusters and large cluster sizes. The method applies to both functional and scalar covariates, and the one-step estimation framework enables efficient smoothing parameter selection, bootstrapping, and joint confidence interval construction. Importantly, this semi-parametric approach yields coefficient confidence intervals that are provably valid asymptotically even under working correlation misspecification. By developing a general theory for adaptive one-step M-estimation, we prove that the coefficient estimates are asymptotically normal and as efficient as the fully-iterated estimator; we verify these theoretical properties in extensive simulations. Finally, we apply our method to a calcium imaging dataset published in , and show that it reveals important timing effects obscured in previous non-functional analyses. In doing so, we demonstrate scaling to common neuroscience dataset sizes: the one-step estimator fits to a dataset with 150,000 (binary) functional outcomes, each observed at 120 functional domain points, in only ~ 13.5 minutes on a laptop without parallelization. We release our implementation in the fastFGEE package.
纵向二元或计数功能数据在神经科学中很常见,但通常规模太大,无法用现有的功能回归方法进行分析。我们提出了一步惩罚广义估计方程,它支持连续、计数或二元功能结果,并且即使在数据集有大量聚类和大聚类规模的情况下也能快速运行。该方法适用于功能协变量和标量协变量,一步估计框架能够实现有效的平滑参数选择、自助法和联合置信区间构建。重要的是,这种半参数方法产生的系数置信区间即使在工作相关性错误设定的情况下也能渐近地证明是有效的。通过发展自适应一步M估计的一般理论,我们证明了系数估计是渐近正态的,并且与完全迭代估计器一样有效;我们在广泛的模拟中验证了这些理论性质。最后,我们将我们的方法应用于发表的一个钙成像数据集,并表明它揭示了在以前的非功能分析中被掩盖的重要时间效应。在此过程中,我们展示了对常见神经科学数据集规模的扩展性:一步估计器在一台没有并行化的笔记本电脑上仅用约13.5分钟就能拟合一个包含150,000个(二元)功能结果的数据集,每个结果在120个功能域点上进行观测。我们在fastFGEE包中发布了我们的实现。