• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

广义矩阵分解:用于将广义线性潜在变量模型拟合到大型数据阵列的高效算法。

Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.

作者信息

Kidziński Łukasz, Hui Francis K C, Warton David I, Hastie Trevor

机构信息

Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.

Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, ACT 2601, Australia.

出版信息

J Mach Learn Res. 2022 Nov;23.

PMID:37102181
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10129058/
Abstract

Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.

摘要

未测量或潜在变量通常是多元测量之间相关性的原因,这些相关性在心理学、生态学和医学等多个领域中都有研究。对于高斯测量,有诸如因子分析或主成分分析等经典工具,它们具有成熟的理论和快速算法。广义线性潜在变量模型(GLLVMs)将此类因子模型推广到非高斯响应。然而,当前用于估计GLLVMs模型参数的算法需要大量计算,并且无法扩展到具有数千个观测单位或响应的大型数据集。在本文中,我们提出了一种将GLLVMs应用于高维数据集的新方法,该方法基于使用惩罚拟似然近似模型,然后使用牛顿法和费舍尔评分来学习模型参数。在计算上,我们的方法明显更快且更稳定,能够对比以前更大的矩阵进行GLLVM拟合。我们将我们的方法应用于一个包含48000个观测单位的数据集,每个单位中有超过2000个观测物种,并发现大部分变异性可以用少数几个因子来解释。我们发布了我们提出的拟合算法的易于使用的实现。

相似文献

1
Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.广义矩阵分解:用于将广义线性潜在变量模型拟合到大型数据阵列的高效算法。
J Mach Learn Res. 2022 Nov;23.
2
Efficient estimation of generalized linear latent variable models.广义线性潜变量模型的有效估计。
PLoS One. 2019 May 1;14(5):e0216129. doi: 10.1371/journal.pone.0216129. eCollection 2019.
3
Big data ordination towards intensive care event count cases using fast computing GLLVMS.大数据使用快速计算 GLLVMS 对重症监护事件计数案例进行排序。
BMC Med Res Methodol. 2022 Mar 21;22(1):77. doi: 10.1186/s12874-022-01538-4.
4
Efficient computation of high-dimensional penalized generalized linear mixed models by latent factor modeling of the random effects.通过随机效应的潜在因子建模实现高维惩罚广义线性混合模型的高效计算。
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae016.
5
Order selection and sparsity in latent variable models via the ordered factor LASSO.通过有序因子套索法实现潜在变量模型中的序贯选择与稀疏性
Biometrics. 2018 Dec;74(4):1311-1319. doi: 10.1111/biom.12888. Epub 2018 May 11.
6
Computation for Latent Variable Model Estimation: A Unified Stochastic Proximal Framework.潜在变量模型估计的计算:统一随机逼近框架。
Psychometrika. 2022 Dec;87(4):1473-1502. doi: 10.1007/s11336-022-09863-9. Epub 2022 May 7.
7
Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data.高效惩罚广义线性混合模型在高维数据中的变量选择和遗传风险预测。
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad063.
8
Assuming independence in spatial latent variable models: Consequences and implications of misspecification.假设空间潜在变量模型中的独立性:误设定的后果和影响。
Biometrics. 2022 Mar;78(1):85-99. doi: 10.1111/biom.13416. Epub 2021 Jan 6.
9
An assessment of estimation methods for generalized linear mixed models with binary outcomes.二项式结局广义线性混合模型估计方法的评估。
Stat Med. 2013 Nov 20;32(26):4550-66. doi: 10.1002/sim.5866. Epub 2013 Jul 9.
10
Generalized latent variable models with non-linear effects.具有非线性效应的广义潜在变量模型
Br J Math Stat Psychol. 2008 Nov;61(Pt 2):415-38. doi: 10.1348/000711007X213963. Epub 2007 May 24.

引用本文的文献

1
Harmonizing heterogeneous single-cell gene expression data with individual-level covariate information.将异质单细胞基因表达数据与个体水平协变量信息进行整合。
Bioinform Adv. 2025 Aug 9;5(1):vbaf189. doi: 10.1093/bioadv/vbaf189. eCollection 2025.
2
Accelerating joint species distribution modelling with Hmsc-HPC by GPU porting.通过 GPU 移植加速 HMSC-HPC 联合物种分布模型
PLoS Comput Biol. 2024 Sep 3;20(9):e1011914. doi: 10.1371/journal.pcbi.1011914. eCollection 2024 Sep.
3
Big data ordination towards intensive care event count cases using fast computing GLLVMS.大数据使用快速计算 GLLVMS 对重症监护事件计数案例进行排序。
BMC Med Res Methodol. 2022 Mar 21;22(1):77. doi: 10.1186/s12874-022-01538-4.

本文引用的文献

1
Stan: A Probabilistic Programming Language.斯坦:一种概率编程语言。
J Stat Softw. 2017;76. doi: 10.18637/jss.v076.i01. Epub 2017 Jan 11.
2
Joint species distribution modelling with the r-package Hmsc.使用R包Hmsc进行联合物种分布建模。
Methods Ecol Evol. 2020 Mar;11(3):442-447. doi: 10.1111/2041-210X.13345. Epub 2020 Jan 23.
3
Efficient estimation of generalized linear latent variable models.广义线性潜变量模型的有效估计。
PLoS One. 2019 May 1;14(5):e0216129. doi: 10.1371/journal.pone.0216129. eCollection 2019.
4
Order selection and sparsity in latent variable models via the ordered factor LASSO.通过有序因子套索法实现潜在变量模型中的序贯选择与稀疏性
Biometrics. 2018 Dec;74(4):1311-1319. doi: 10.1111/biom.12888. Epub 2018 May 11.
5
How to make more out of community data? A conceptual framework and its implementation as models and software.如何从社区数据中获得更多信息?一个概念框架及其作为模型和软件的实现。
Ecol Lett. 2017 May;20(5):561-576. doi: 10.1111/ele.12757. Epub 2017 Mar 20.
6
Extending Joint Models in Community Ecology: A Response to Beissinger et al.扩展群落生态学中的联合模型:对贝辛格等人的回应
Trends Ecol Evol. 2016 Oct;31(10):737-738. doi: 10.1016/j.tree.2016.07.007. Epub 2016 Aug 8.
7
So Many Variables: Joint Modeling in Community Ecology.如此多的变量:群落生态学中的联合建模。
Trends Ecol Evol. 2015 Dec;30(12):766-779. doi: 10.1016/j.tree.2015.09.007. Epub 2015 Oct 28.
8
Factor analytic mixed models for the provision of grower information from national crop variety testing programs.用于从国家作物品种测试项目中提供种植者信息的因子分析混合模型。
Theor Appl Genet. 2015 Jan;128(1):55-72. doi: 10.1007/s00122-014-2412-x. Epub 2014 Oct 19.
9
Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses.使用表达残差的概率估计(PEER)提高基因表达分析的能力和可解释性。
Nat Protoc. 2012 Feb 16;7(3):500-7. doi: 10.1038/nprot.2011.457.
10
Spectral Regularization Algorithms for Learning Large Incomplete Matrices.用于学习大型不完整矩阵的谱正则化算法
J Mach Learn Res. 2010 Mar 1;11:2287-2322.