Suppr超能文献

在数据流上估计多层模型。

Estimating Multilevel Models on Data Streams.

机构信息

Institute of Data Science, Maastricht University, Maastricht, The Netherlands.

Tilburg University, Tilburg, The Netherlands.

出版信息

Psychometrika. 2019 Mar;84(1):41-64. doi: 10.1007/s11336-018-09656-z. Epub 2019 Jan 22.

Abstract

Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or "row-by-row"). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.

摘要

社会科学家经常面临具有嵌套结构的数据

学生嵌套在学校中,员工嵌套在公司中,或者重复测量嵌套在个体中。嵌套数据通常使用多层次模型进行分析。然而,当数据集非常大或新数据不断增加数据集时,估计多层次模型可能具有挑战性:当前用于拟合多层次模型的算法会反复重新访问所有数据点,最终会消耗大量时间和计算机内存。当需要实时预测并且观测值不断涌入时,这尤其麻烦。我们通过引入用于在线(或“逐行”)拟合多层次模型的 Streaming Expectation Maximization Approximation(SEMA)算法来解决此问题。在广泛的模拟研究中,我们展示了 SEMA 与传统的多层次模型拟合方法相比的性能。接下来,使用 SEMA 分析经验数据流。SEMA 的准确性与当前最先进的方法具有竞争力,而速度则快几个数量级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/139c/6373343/4fffbf157966/11336_2018_9656_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验