Institute of Data Science, Maastricht University, Maastricht, The Netherlands.
Tilburg University, Tilburg, The Netherlands.
Psychometrika. 2019 Mar;84(1):41-64. doi: 10.1007/s11336-018-09656-z. Epub 2019 Jan 22.
Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or "row-by-row"). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.
学生嵌套在学校中,员工嵌套在公司中,或者重复测量嵌套在个体中。嵌套数据通常使用多层次模型进行分析。然而,当数据集非常大或新数据不断增加数据集时,估计多层次模型可能具有挑战性:当前用于拟合多层次模型的算法会反复重新访问所有数据点,最终会消耗大量时间和计算机内存。当需要实时预测并且观测值不断涌入时,这尤其麻烦。我们通过引入用于在线(或“逐行”)拟合多层次模型的 Streaming Expectation Maximization Approximation(SEMA)算法来解决此问题。在广泛的模拟研究中,我们展示了 SEMA 与传统的多层次模型拟合方法相比的性能。接下来,使用 SEMA 分析经验数据流。SEMA 的准确性与当前最先进的方法具有竞争力,而速度则快几个数量级。