Collier Zachary K, Leite Walter L, Karpyn Allison
5972University of Delaware, Newark, DE, USA.
3463University of Florida, Gainesville, FL, USA.
Eval Rev. 2021 Mar 3:193841X21992199. doi: 10.1177/0193841X21992199.
The generalized propensity score (GPS) addresses selection bias due to observed confounding variables and provides a means to demonstrate causality of continuous treatment doses with propensity score analyses. Estimating the GPS with parametric models obliges researchers to meet improbable conditions such as correct model specification, normal distribution of variables, and large sample sizes.
The purpose of this Monte Carlo simulation study is to examine the performance of neural networks as compared to full factorial regression models to estimate GPS in the presence of Gaussian and skewed treatment doses and small to moderate sample sizes.
A detailed conceptual introduction of neural networks is provided, as well as an illustration of selection of hyperparameters to estimate GPS. An example from public health and nutrition literature uses residential distance as a treatment variable to illustrate how neural networks can be used in a propensity score analysis to estimate a dose-response function of grocery spending behaviors.
We found substantially higher correlations and lower mean squared error values after comparing true GPS with the scores estimated by neural networks. The implication is that more selection bias was removed using GPS estimated with neural networks than using GPS estimated with classical regression.
This study proposes a new methodological procedure, neural networks, to estimate GPS. Neural networks are not sensitive to the assumptions of linear regression and other parametric models and have been shown to be a contender against parametric approaches to estimate propensity scores for continuous treatments.
广义倾向得分(GPS)解决了因观察到的混杂变量导致的选择偏倚问题,并提供了一种通过倾向得分分析来证明连续治疗剂量因果关系的方法。使用参数模型估计GPS要求研究人员满足一些不太可能实现的条件,如正确的模型设定、变量的正态分布和大样本量。
本蒙特卡洛模拟研究的目的是在存在高斯分布和偏态治疗剂量以及中小样本量的情况下,检验神经网络与全因子回归模型相比在估计GPS方面的性能。
提供了神经网络的详细概念介绍,以及用于估计GPS的超参数选择说明。来自公共卫生和营养文献的一个例子使用居住距离作为治疗变量,来说明如何在倾向得分分析中使用神经网络来估计食品杂货消费行为的剂量反应函数。
在将真实GPS与神经网络估计的得分进行比较后,我们发现相关性显著更高,均方误差值更低。这意味着与使用经典回归估计的GPS相比,使用神经网络估计的GPS消除了更多的选择偏倚。
本研究提出了一种估计GPS的新方法——神经网络。神经网络对线性回归和其他参数模型的假设不敏感,并且已被证明是估计连续治疗倾向得分的参数方法的有力竞争者。