广义矩阵分解：用于将广义线性潜在变量模型拟合到大型数据阵列的高效算法。

Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.

作者信息

Kidziński Łukasz, Hui Francis K C, Warton David I, Hastie Trevor

机构信息

Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.

Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, ACT 2601, Australia.

出版信息

J Mach Learn Res. 2022 Nov;23.

PMID:37102181

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10129058/

Abstract

Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.

摘要

未测量或潜在变量通常是多元测量之间相关性的原因，这些相关性在心理学、生态学和医学等多个领域中都有研究。对于高斯测量，有诸如因子分析或主成分分析等经典工具，它们具有成熟的理论和快速算法。广义线性潜在变量模型（GLLVMs）将此类因子模型推广到非高斯响应。然而，当前用于估计GLLVMs模型参数的算法需要大量计算，并且无法扩展到具有数千个观测单位或响应的大型数据集。在本文中，我们提出了一种将GLLVMs应用于高维数据集的新方法，该方法基于使用惩罚拟似然近似模型，然后使用牛顿法和费舍尔评分来学习模型参数。在计算上，我们的方法明显更快且更稳定，能够对比以前更大的矩阵进行GLLVM拟合。我们将我们的方法应用于一个包含48000个观测单位的数据集，每个单位中有超过2000个观测物种，并发现大部分变异性可以用少数几个因子来解释。我们发布了我们提出的拟合算法的易于使用的实现。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

广义矩阵分解：用于将广义线性潜在变量模型拟合到大型数据阵列的高效算法。

Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

广义矩阵分解：用于将广义线性潜在变量模型拟合到大型数据阵列的高效算法。

Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献