Canadian VIGOUR Centre, University of Alberta, Edmonton, Alberta, Canada.
Department of Medicine, University of Alberta, Edmonton, Alberta, Canada.
Stat Med. 2021 Jul 20;40(16):3667-3681. doi: 10.1002/sim.8991. Epub 2021 Apr 18.
Polytomous regression models generalize logistic models for the case of a categorical outcome variable with more than two distinct categories. These models are currently used in clinical research, and it is essential to measure their abilities to distinguish between the categories of the outcome. In 2012, van Calster et al proposed the polytomous discrimination index (PDI) as an extension of the binary discrimination c-statistic to unordered polytomous regression. The PDI is a summary of the simultaneous discrimination between all outcome categories. Previous implementations of the PDI are not capable of running on "big data." This article shows that the PDI formula can be manipulated to depend only on the distributions of the predicted probabilities evaluated for each outcome category and within each observed level of the outcome, which substantially improves the computation time. We present a SAS macro and R function that can rapidly evaluate the PDI and its components. The routines are evaluated on several simulated datasets after varying the number of categories of the outcome and size of the data and two real-world large administrative health datasets. We compare PDI with two other discrimination indices: M-index and hypervolume under the manifold (HUM) on simulated examples. We describe situations where the PDI and HUM, indices based on multiple comparisons, are superior to the M-index, an index based on pairwise comparisons, to detect predictions that are no different than random selection or erroneous due to incorrect ranking.
多分类回归模型将逻辑回归模型推广到具有两个以上不同类别的分类因变量的情况。这些模型目前在临床研究中使用,衡量它们区分因变量类别的能力是至关重要的。2012 年,van Calster 等人提出了多分类判别指数(PDI),作为对无序多分类回归中二元判别 c 统计量的扩展。PDI 是对所有因变量类别的同时判别能力的总结。先前的 PDI 实现无法在“大数据”上运行。本文表明,可以对 PDI 公式进行操作,使其仅依赖于为每个因变量类别评估的预测概率分布和每个观测水平的因变量内的分布,这大大提高了计算时间。我们提出了一个 SAS 宏和 R 函数,可以快速评估 PDI 及其组成部分。在改变因变量类别的数量和数据大小以及两个真实的大型行政健康数据集后,我们对几个模拟数据集进行了评估。我们在模拟示例中比较了 PDI 与其他两个判别指数:M 指数和基于流形的超体积(HUM)。我们描述了基于多次比较的 PDI 和 HUM 指数优于基于成对比较的 M 指数的情况,以检测预测与随机选择或由于错误排序而错误的预测没有区别。