Cheng Yu-Jen, Wang Mei-Cheng
Institute of Statistics, National Tsing Hua University, Hsin-Chu 300, Taiwan.
Biometrics. 2012 Sep;68(3):707-16. doi: 10.1111/j.1541-0420.2012.01754.x. Epub 2012 Jul 26.
This article develops semiparametric approaches for estimation of propensity scores and causal survival functions from prevalent survival data. The analytical problem arises when the prevalent sampling is adopted for collecting failure times and, as a result, the covariates are incompletely observed due to their association with failure time. The proposed procedure for estimating propensity scores shares interesting features similar to the likelihood formulation in case-control study, but in our case it requires additional consideration in the intercept term. The result shows that the corrected propensity scores in logistic regression setting can be obtained through standard estimation procedure with specific adjustments on the intercept term. For causal estimation, two different types of missing sources are encountered in our model: one can be explained by potential outcome framework; the other is caused by the prevalent sampling scheme. Statistical analysis without adjusting bias from both sources of missingness will lead to biased results in causal inference. The proposed methods were partly motivated by and applied to the Surveillance, Epidemiology, and End Results (SEER)-Medicare linked data for women diagnosed with breast cancer.
本文开发了半参数方法,用于从现患生存数据中估计倾向得分和因果生存函数。当采用现患抽样来收集失效时间时,就会出现分析问题,结果是由于协变量与失效时间的关联,它们并未被完全观察到。所提出的估计倾向得分的程序具有一些有趣的特征,类似于病例对照研究中的似然公式,但在我们的案例中,需要对截距项进行额外考虑。结果表明,在逻辑回归设置中,通过对截距项进行特定调整的标准估计程序,可以获得校正后的倾向得分。对于因果估计,在我们的模型中会遇到两种不同类型的缺失来源:一种可以通过潜在结果框架来解释;另一种是由现患抽样方案导致的。如果不调整来自这两种缺失来源的偏差进行统计分析,将会导致因果推断中的结果出现偏差。所提出的方法部分受到了监测、流行病学和最终结果(SEER)-医疗保险链接数据的启发,并应用于该数据,这些数据来自被诊断患有乳腺癌的女性。