多元回归与两种潜在变量技术在估计和预测方面的比较。

Comparison of multiple regression to two latent variable techniques for estimation and prediction.

作者信息

Wall Melanie M, Li Ruifeng

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.

出版信息

Stat Med. 2003 Dec 15;22(23):3671-85. doi: 10.1002/sim.1588.

DOI:10.1002/sim.1588

PMID:14652868

Abstract

In the areas of epidemiology, psychology, sociology, and other social and behavioural sciences, researchers often encounter situations where there are not only many variables contributing to a particular phenomenon, but there are also strong relationships among many of the predictor variables of interest. By using the traditional multiple regression on all the predictor variables, it is possible to have problems with interpretation and multicollinearity. As an alternative to multiple regression, we explore the use of a latent variable model that can address the relationship among the predictor variables. We consider two different methods for estimation and prediction for this model: one that uses multiple regression on factor score estimates and the other that uses structural equation modelling. The first method uses multiple regression but on a set of predicted underlying factors (i.e. factor scores), and the second method is a full-information maximum-likelihood technique that incorporates the complete covariance structure of the data. In this tutorial, we will explain the model and each estimation method, including how to carry out prediction. A data example will be used for demonstration, where respiratory disease death rates by county in Minnesota are predicted by five county-level census variables. A simulation study is performed to evaluate the efficiency of prediction using the two latent variable modelling techniques compared to multiple regression.

摘要

在流行病学、心理学、社会学以及其他社会和行为科学领域，研究人员常常遇到这样的情况：不仅有许多变量促成某一特定现象，而且许多相关预测变量之间还存在着密切关系。通过对所有预测变量使用传统的多元回归分析，可能会出现解释和多重共线性方面的问题。作为多元回归的替代方法，我们探索使用一种能够处理预测变量之间关系的潜在变量模型。我们考虑了该模型的两种不同估计和预测方法：一种是对因子得分估计值使用多元回归，另一种是使用结构方程建模。第一种方法使用多元回归，但针对的是一组预测的潜在因子（即因子得分），第二种方法是一种全信息最大似然技术，它纳入了数据的完整协方差结构。在本教程中，我们将解释该模型以及每种估计方法，包括如何进行预测。将使用一个数据示例进行演示，其中通过五个县级人口普查变量预测明尼苏达州各县的呼吸系统疾病死亡率。进行了一项模拟研究，以评估与多元回归相比，使用这两种潜在变量建模技术进行预测的效率。