大数据使用快速计算 GLLVMS 对重症监护事件计数案例进行排序。

Big data ordination towards intensive care event count cases using fast computing GLLVMS.

机构信息

Executive Secretariat, National Research and Innovation Agency (BRIN), DKI Jakarta, 10340, Indonesia.

Department of Information Management, College of Informatics, Chaoyang University of Technology, Taichung City, 41349, Taiwan.

出版信息

BMC Med Res Methodol. 2022 Mar 21;22(1):77. doi: 10.1186/s12874-022-01538-4.

DOI:10.1186/s12874-022-01538-4

PMID:35313816

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8939086/

Abstract

BACKGROUND

In heart data mining and machine learning, dimension reduction is needed to remove multicollinearity. Meanwhile, it has been proven to improve the interpretation of the parameter model. In addition, dimension reduction can also increase the time of computing in high dimensional data.

METHODS

In this paper, we perform high dimensional ordination towards event counts in intensive care hospital for Emergency Department (ED 1), First Intensive Care Unit (ICU1), Second Intensive Care Unit (ICU2), Respiratory Care Intensive Care Unit (RICU), Surgical Intensive Care Unit (SICU), Subacute Respiratory Care Unit (RCC), Trauma and Neurosurgery Intensive Care Unit (TNCU), Neonatal Intensive Care Unit (NICU) which use the Generalized Linear Latent Variable Models (GLLVM's).

RESULTS

During the analysis, we measure the performance and calculate the time computing of GLLVM by employing variational approximation and Laplace approximation, and compare the different distributions, including Negative Binomial, Poisson, Gaussian, ZIP, and Tweedie, respectively. GLLVMs (Generalized Linear Latent Variable Models), an extended version of GLMs (Generalized Linear Models) with latent variables, have fast computing time. The major challenge in latent variable modelling is that the function [Formula: see text] is not trivial to solve since the marginal likelihood involves integration over the latent variable u.

CONCLUSIONS

In a nutshell, GLLVMs lead as the best performance reaching the variance of 98% comparing other methods. We get the best model negative binomial and Variational approximation, which provides the best accuracy by accuracy value of AIC, AICc, and BIC. In a nutshell, our best model is GLLVM-VA Negative Binomial with AIC 7144.07 and GLLVM-LA Negative Binomial with AIC 6955.922.

摘要

背景

在心脏数据挖掘和机器学习中，需要进行降维以消除多重共线性。同时，这已被证明可以提高参数模型的解释能力。此外，降维还可以增加高维数据的计算时间。

方法

在本文中，我们对急诊部 (ED1)、第一重症监护病房 (ICU1)、第二重症监护病房 (ICU2)、呼吸治疗重症监护病房 (RICU)、外科重症监护病房 (SICU)、亚急性呼吸治疗病房 (RCC)、创伤和神经外科重症监护病房 (TNCU)、新生儿重症监护病房 (NICU) 的事件计数进行高维排序，这些病房均采用广义线性潜在变量模型 (GLLVM)。

结果

在分析过程中，我们通过采用变分逼近和拉普拉斯逼近来衡量 GLLVM 的性能并计算计算时间，并分别比较了不同的分布，包括负二项式、泊松、高斯、ZIP 和 Tweedie。GLLVMs（广义线性潜在变量模型）是具有潜在变量的广义线性模型 (GLMs) 的扩展版本，具有快速的计算时间。潜在变量建模的主要挑战是函数 [公式：见文本] 并不简单，因为边缘似然涉及到对潜在变量 u 的积分。

结论

简而言之，GLLVM 表现最佳，与其他方法相比，达到了 98%的方差。我们得到了最佳的负二项式和变分逼近模型，通过 AIC、AICc 和 BIC 的准确性值提供了最佳的准确性。简而言之，我们的最佳模型是 GLLVM-VA 负二项式，AIC 为 7144.07，GLLVM-LA 负二项式，AIC 为 6955.922。