Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.
Department of Epidemiology, University of Michigan, Ann Arbor, MI, 48109, USA.
Stat Med. 2014 Feb 10;33(3):455-469. doi: 10.1002/sim.5940. Epub 2013 Aug 12.
For binary or categorical response models, most goodness-of-fit statistics are based on the notion of partitioning the subjects into groups or regions and comparing the observed and predicted responses in these regions by a suitable chi-squared distribution. Existing strategies create this partition based on the predicted response probabilities, or propensity scores, from the fitted model. In this paper, we follow a retrospective approach, borrowing the notion of balancing scores used in causal inference to inspect the conditional distribution of the predictors, given the propensity scores, in each category of the response to assess model adequacy. We can use this diagnostic under both prospective and retrospective sampling designs, and it may ascertain general forms of misspecification. We first present simple graphical and numerical summaries that can be used in a binary logistic model. We then generalize the tools to propose model diagnostics for the proportional odds model. We illustrate the methods with simulation studies and two data examples: (i) a case-control study of the association between cumulative lead exposure and Parkinson's disease in the Boston, Massachusetts, area and (ii) and a cohort study of biomarkers possibly associated with diabetes, from the VA Normative Aging Study.
对于二项式或分类响应模型,大多数拟合优度统计量都是基于将主体划分为组或区域的概念,并通过适当的卡方分布比较这些区域中的观察和预测响应。现有的策略是基于拟合模型的预测响应概率或倾向得分来创建此分区。在本文中,我们采用回顾性方法,借鉴因果推断中使用的平衡分数的概念,检查给定倾向得分时响应中每个类别的预测变量的条件分布,以评估模型的充分性。我们可以在前瞻性和回顾性抽样设计下使用此诊断工具,并且它可以确定一般形式的误指定。我们首先介绍可用于二项逻辑模型的简单图形和数值摘要。然后,我们推广这些工具,为比例优势模型提出模型诊断。我们使用模拟研究和两个数据示例来说明这些方法:(i)马萨诸塞州波士顿地区累积铅暴露与帕金森病之间关联的病例对照研究,以及(ii)退伍军人事务正常衰老研究中可能与糖尿病相关的生物标志物的队列研究。