Luo Lan, Li Lexin
Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa, USA.
Department of Biostatistics and Epidemiology, University of California, Berkeley, Berkeley, California, USA.
Stat Med. 2022 Nov 10;41(25):5113-5133. doi: 10.1002/sim.9557. Epub 2022 Aug 19.
In this article, we tackle the estimation and inference problem of analyzing distributed streaming data that is collected continuously over multiple data sites. We propose an online two-way approach via linear mixed-effects models. We explicitly model the site-specific effects as random-effect terms, and tackle both between-site heterogeneity and within-site correlation. We develop an online updating procedure that does not need to re-access the previous data and can efficiently update the parameter estimate, when either new data sites, or new streams of sample observations of the existing data sites, become available. We derive the non-asymptotic error bound for our proposed online estimator, and show that it is asymptotically equivalent to the offline counterpart based on all the raw data. We compare with some key alternative solutions both analytically and numerically, and demonstrate the advantages of our proposal. We further illustrate our method with two data applications.
在本文中,我们解决了分析在多个数据站点上持续收集的分布式流数据的估计和推断问题。我们提出了一种通过线性混合效应模型的在线双向方法。我们将特定站点的效应明确建模为随机效应项,并处理站点间的异质性和站点内的相关性。当新的数据站点或现有数据站点的新样本观测流可用时,我们开发了一种无需重新访问先前数据且能有效更新参数估计的在线更新程序。我们推导了所提出的在线估计器的非渐近误差界,并表明它与基于所有原始数据的离线对应估计器渐近等效。我们通过分析和数值方法与一些关键的替代解决方案进行比较,并展示了我们方案的优势。我们进一步用两个数据应用示例来说明我们的方法。