Biostatistics and Epidemiology Division, RTI International, Research Triangle Park, North Carolina, Research Triangle Park, North Carolina, USA.
Center for Clinical Research and Evidence-Based Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA.
BMC Med Res Methodol. 2024 Oct 31;24(1):261. doi: 10.1186/s12874-024-02389-x.
Medical outcomes of interest to clinicians may have multiple categories. Researchers face several options for risk prediction of such outcomes, including dichotomized logistic regression and multinomial logit regression modeling. We aimed to compare these methods and provide guidance needed for practice.
We described dichotomized logistic regression, multinomial continuation-ratio logit regression, which is an alternative to standard multinomial logit regression for ordinal outcomes, and logistic competing risks regression. We then applied these methods to develop prediction models of survival and neurodevelopmental outcomes based on the NICHD Extremely Preterm Birth Outcome Tool model. The statistical and practical advantages and flaws of these methods were examined. Both discrimination and calibration of the estimated logistic models of dichotomized outcomes and continuation-ratio logit model were assessed.
The dichotomized logistic models and multinomial continuation-ratio logit model had similar discrimination and calibration in predicting death and survival without neurodevelopmental impairment. But the continuation-ratio logit model had better discrimination and calibration in predicting neurodevelopmental impairment. The sum of predicted probabilities of outcome categories from the dichotomized logistic models could deviate from 100% substantially, ranging from 87.7 to 124.0%, and the dichotomized logistic model of neurodevelopmental impairment greatly overpredicted low risks and underpredicted high risks.
Estimating multiple logistic regression models of dichotomized outcomes may result in poorly calibrated predictions for an outcome with multiple ordinal categories. Multinomial continuation-ratio logit regression produces better calibrated predictions, constrains the sum of predicted probabilities to 100%, and has the advantages of simplicity in model interpretation, flexibility to include outcome category-specific predictors and random-effect terms for patient heterogeneity by hospital. It also accounts for mutual dependence among multiple categories and accommodates competing risks.
临床医生关注的医学结局可能具有多个类别。研究人员在预测此类结局方面有多种选择,包括二项逻辑回归和多项逻辑回归建模。我们旨在比较这些方法并为实践提供必要的指导。
我们描述了二项逻辑回归、多项连续比逻辑回归(ordinal outcomes 的替代标准多项逻辑回归)和逻辑竞争风险回归。然后,我们应用这些方法基于 NICHD 极早产儿结局工具模型开发了生存和神经发育结局的预测模型。检验了这些方法的统计和实际优势和缺陷。评估了二项逻辑模型和连续比逻辑模型的估计分类和校准。
在预测无神经发育损伤的死亡和生存方面,二项逻辑模型和多项连续比逻辑模型的判别和校准相似。但是,连续比逻辑模型在预测神经发育损伤方面具有更好的判别和校准。二项逻辑模型的各个结局类别预测概率之和可能会严重偏离 100%,范围从 87.7%到 124.0%,并且神经发育损伤的二项逻辑模型大大高估了低风险,低估了高风险。
估计二项逻辑模型的多个分类结局可能会导致具有多个有序类别结局的预测校准不佳。多项连续比逻辑回归产生了更好的校准预测,将预测概率之和限制在 100%,具有模型解释简单、灵活、可以包含结局类别特定预测因子和患者异质性的随机效应项的优势,通过医院考虑到多个类别的相互依赖性并适应竞争风险。