Liu Dungang, Zhang Heping
Assistant Professor, University of Cincinnati Lindner College of Business, Cincinnati, OH 45221.
Susan Dwight Bliss Professor, Yale University School of Public Health, New Haven, CT 06520.
J Am Stat Assoc. 2018;113(522):845-854. doi: 10.1080/01621459.2017.1292915. Epub 2018 Jun 6.
Ordinal outcomes are common in scientific research and everyday practice, and we often rely on regression models to make inference. A long-standing problem with such regression analyses is the lack of effective diagnostic tools for validating model assumptions. The difficulty arises from the fact that an ordinal variable has discrete values that are labeled with, but not, numerical values. The values merely represent ordered categories. In this paper, we propose a surrogate approach to defining residuals for an ordinal outcome . The idea is to define a continuous variable as a "surrogate" of and then obtain residuals based on . For the general class of cumulative link regression models, we study the residual's theoretical and graphical properties. We show that the residual has null properties similar to those of the common residuals for continuous outcomes. Our numerical studies demonstrate that the residual has power to detect misspecification with respect to 1) mean structures; 2) link functions; 3) heteroscedasticity; 4) proportionality; and 5) mixed populations. The proposed residual also enables us to develop numeric measures for goodness-of-fit using classical distance notions. Our results suggest that compared to a previously defined residual, our residual can reveal deeper insights into model diagnostics. We stress that this work focuses on residual analysis, rather than hypothesis testing. The latter has limited utility as it only provides a single -value, whereas our residual can reveal what components of the model are misspecified and advise how to make improvements.
有序结果在科学研究和日常实践中很常见,我们经常依靠回归模型进行推断。此类回归分析长期存在的一个问题是缺乏有效的诊断工具来验证模型假设。困难源于这样一个事实,即有序变量具有离散值,这些值用数字标记,但并非数值本身。这些值仅仅代表有序类别。在本文中,我们提出了一种替代方法来定义有序结果的残差。其思路是定义一个连续变量作为的“替代”,然后基于获得残差。对于累积链接回归模型的一般类别,我们研究了残差的理论和图形属性。我们表明,该残差具有与连续结果的常见残差类似的零属性。我们的数值研究表明,该残差有能力检测关于1)均值结构;2)链接函数;3)异方差性;4)比例性;以及5)混合总体的模型误设。所提出的残差还使我们能够使用经典距离概念开发拟合优度的数值度量。我们的结果表明,与先前定义的残差相比,我们的残差能够揭示对模型诊断更深入的见解。我们强调,这项工作侧重于残差分析,而非假设检验。后者的效用有限,因为它只提供一个单一值,而我们的残差可以揭示模型的哪些组成部分被误设,并建议如何进行改进。