在数据流上估计多层模型。

Estimating Multilevel Models on Data Streams.

机构信息

Institute of Data Science, Maastricht University, Maastricht, The Netherlands.

Tilburg University, Tilburg, The Netherlands.

出版信息

Psychometrika. 2019 Mar;84(1):41-64. doi: 10.1007/s11336-018-09656-z. Epub 2019 Jan 22.

DOI:10.1007/s11336-018-09656-z

PMID:30671789

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6373343/

Abstract

Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or "row-by-row"). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.

摘要

社会科学家经常面临具有嵌套结构的数据

学生嵌套在学校中，员工嵌套在公司中，或者重复测量嵌套在个体中。嵌套数据通常使用多层次模型进行分析。然而，当数据集非常大或新数据不断增加数据集时，估计多层次模型可能具有挑战性：当前用于拟合多层次模型的算法会反复重新访问所有数据点，最终会消耗大量时间和计算机内存。当需要实时预测并且观测值不断涌入时，这尤其麻烦。我们通过引入用于在线（或“逐行”）拟合多层次模型的 Streaming Expectation Maximization Approximation（SEMA）算法来解决此问题。在广泛的模拟研究中，我们展示了 SEMA 与传统的多层次模型拟合方法相比的性能。接下来，使用 SEMA 分析经验数据流。SEMA 的准确性与当前最先进的方法具有竞争力，而速度则快几个数量级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/6373343/4fffbf157966/11336_2018_9656_Fig1_HTML.jpg

相似文献

Estimating Multilevel Models on Data Streams.

Psychometrika. 2019 Mar;84(1):41-64. doi: 10.1007/s11336-018-09656-z. Epub 2019 Jan 22.

Incorporating Mobility in Growth Modeling for Multilevel and Longitudinal Item Response Data.

Multivariate Behav Res. 2016;51(1):120-37. doi: 10.1080/00273171.2015.1114911.

The relationship between multilevel models and non-parametric multilevel mixture models: Discrete approximation of intraclass correlation, random coefficient distributions, and residual heteroscedasticity.

Br J Math Stat Psychol. 2016 Nov;69(3):316-343. doi: 10.1111/bmsp.12073.

Fair Max-Min Diversity Maximization in Streaming and Sliding-Window Models.

Entropy (Basel). 2023 Jul 14;25(7):1066. doi: 10.3390/e25071066.

Modelling partially cross-classified multilevel data.

Br J Math Stat Psychol. 2015 May;68(2):342-62. doi: 10.1111/bmsp.12050. Epub 2015 Mar 13.

Using Multilevel Models and Generalized Estimating Equation Models to Account for Clustering in Neurology Clinical Research.

Neurology. 2024 Nov 12;103(9):e209947. doi: 10.1212/WNL.0000000000209947. Epub 2024 Oct 11.

Multilevel Conditional Autoregressive models for longitudinal and spatially referenced epidemiological data.

Spat Spatiotemporal Epidemiol. 2022 Jun;41:100477. doi: 10.1016/j.sste.2022.100477. Epub 2022 Jan 29.

A New Multilevel CART Algorithm for Multilevel Data with Binary Outcomes.

Multivariate Behav Res. 2019 Jul-Aug;54(4):578-592. doi: 10.1080/00273171.2018.1552555. Epub 2019 Jan 15.

Multiple imputation of missing data in multilevel designs: A comparison of different strategies.

Psychol Methods. 2017 Mar;22(1):141-165. doi: 10.1037/met0000096. Epub 2016 Sep 8.

Estimating Standardized Effect Sizes for Two- and Three-Level Partially Nested Data.

Multivariate Behav Res. 2016 Nov-Dec;51(6):740-756. doi: 10.1080/00273171.2016.1231606. Epub 2016 Nov 1.

引用本文的文献

Fast meta-analytic approximations for relational event models: applications to data streams and multilevel data.

J Comput Soc Sci. 2024;7(2):1823-1859. doi: 10.1007/s42001-024-00290-7. Epub 2024 Jun 8.

本文引用的文献

High frequency body mass measurement, feedback, and health behaviors.

Econ Hum Biol. 2014 Jul;14:141-53. doi: 10.1016/j.ehb.2013.12.003. Epub 2014 Jan 18.

A comparison of incomplete-data methods for categorical data.

Stat Methods Med Res. 2016 Apr;25(2):754-74. doi: 10.1177/0962280212465502. Epub 2012 Nov 18.

Review: a gentle introduction to imputation of missing values.

J Clin Epidemiol. 2006 Oct;59(10):1087-91. doi: 10.1016/j.jclinepi.2006.01.014. Epub 2006 Jul 11.

Some theorems in least squares.

Biometrika. 1950 Jun;37(1-2):149-57.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在数据流上估计多层模型。

Estimating Multilevel Models on Data Streams.

机构信息

出版信息

社会科学家经常面临具有嵌套结构的数据

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献