MRC Clinical Trials Unit, 222 Euston Road, London NW12DA, UK.
Stat Med. 2010 Oct 30;29(24):2508-20. doi: 10.1002/sim.3994.
Logistic regression models are widely used in medicine for predicting patient outcome (prognosis) and constructing diagnostic tests (diagnosis). Multivariable logistic models yield an (approximately) continuous risk score, a transformation of which gives the estimated event probability for an individual. A key aspect of model performance is discrimination, that is, the model's ability to distinguish between patients who have (or will have) an event of interest and those who do not (or will not). Graphical aids are important in understanding a logistic model. The receiver-operating characteristic (ROC) curve is familiar, but not necessarily easy to interpret. We advocate a simple graphic that provides further insight into discrimination, namely a histogram or dot plot of the risk score in the outcome groups. The most popular performance measure for the logistic model is the c-index, numerically equivalent to the area under the ROC curve. We discuss the comparative merits of the c-index and the (standardized) mean difference in risk score between the outcome groups. The latter statistic, sometimes known generically as the effect size, has been computed in slightly different ways by several different authors, including Glass, Cohen and Hedges. An alternative measure is the overlap between the distributions in the outcome groups, defined as the area under the minimum of the two density functions. The larger the overlap, the weaker the discrimination. Under certain assumptions about the distribution of the risk score, the c-index, effect size and overlap are functionally related. We illustrate the ideas with simulated and real data sets.
逻辑回归模型在医学中被广泛用于预测患者的结局(预后)和构建诊断测试(诊断)。多变量逻辑模型产生一个(近似)连续的风险评分,该评分的转换给出了个体的估计事件概率。模型性能的一个关键方面是区分度,即模型区分有(或将会有)感兴趣事件的患者和没有(或不会有)感兴趣事件的患者的能力。图形辅助工具对于理解逻辑模型很重要。熟悉接收者操作特征(ROC)曲线,但不一定容易解释。我们提倡使用一种简单的图形,提供对区分度的进一步了解,即风险评分在结局组中的直方图或点图。逻辑模型最常用的性能衡量指标是 c 指数,它在数值上等同于 ROC 曲线下的面积。我们讨论了 c 指数和结局组之间风险评分(标准化)均值差异的相对优点。后者的统计量,有时通常称为效应量,已被几位不同的作者以略有不同的方式计算,包括 Glass、Cohen 和 Hedges。另一个替代衡量标准是结局组之间分布的重叠,定义为两个密度函数中的最小值的面积。重叠越大,区分度越弱。在风险评分分布的某些假设下,c 指数、效应量和重叠在功能上是相关的。我们用模拟数据集和真实数据集来说明这些想法。