Li Lei, Rysavy Matthew A, Bobashev Georgiy, Das Abhik
RTI Intenational.
University of Texas Health Science Center at Houston.
Res Sq. 2024 Feb 5:rs.3.rs-3911212. doi: 10.21203/rs.3.rs-3911212/v1.
Medical outcomes of interest to clinicians may have multiple categories. Researchers face several options for risk prediction of such outcomes, including dichotomized logistic regression and multinomial logit regression modeling. We aimed to compare these methods and provide practical guidance needed.
We described dichotomized logistic regression and competing risks regression, and an alternative to standard multinomial logit regression, continuation-ratio logit regression for ordinal outcomes. We then applied these methods to develop prediction models of survival and growth outcomes based on the NICHD Extremely Preterm Birth Outcome Tool model. The statistical and practical advantages and flaws of these methods were examined and both discrimination and calibration of the estimated models were assessed.
The dichotomized logistic models and multinomial continuation-ratio logit model had similar discrimination and calibration in predicting death and survival without neurodevelopmental impairment. But the continuation-ratio logit model had better discrimination and calibration in predicting probabilities of neurodevelopmental impairment. The sum of predicted probabilities of outcome categories from the logistic models did not equal 100% for about half of the study infants, ranging from 87.7% to 124.0%, and the logistic model of neurodevelopmental impairment greatly overpredicted the risk among low-risk infants and underpredicted among high-risk infants.
Estimating multiple logistic regression models of dichotomized outcomes may result in poorly calibrated predictions. For an outcome with multiple ordinal categories, continuation-ratio logit regression is a useful alternative to standard multinomial logit regression. It produces better calibrated predictions and has the advantages of simplicity in model interpretation and flexibility to include outcome category-specific predictors and random-effect terms for patient heterogeneity by hospital.
临床医生感兴趣的医学结局可能有多个类别。研究人员在对此类结局进行风险预测时有多种选择,包括二分逻辑回归和多项logit回归建模。我们旨在比较这些方法并提供所需的实用指导。
我们描述了二分逻辑回归和竞争风险回归,以及标准多项logit回归的一种替代方法,即用于有序结局的连续比例logit回归。然后,我们应用这些方法,基于美国国立儿童健康与人类发展研究所(NICHD)极早早产儿结局工具模型,开发生存和生长结局的预测模型。我们研究了这些方法的统计学和实际优势与缺陷,并评估了估计模型的区分度和校准度。
在预测死亡和无神经发育障碍的生存情况时,二分逻辑模型和多项连续比例logit模型具有相似的区分度和校准度。但在预测神经发育障碍的概率时,连续比例logit模型具有更好的区分度和校准度。对于大约一半的研究婴儿,逻辑模型的结局类别预测概率之和不等于100%,范围在87.7%至124.0%之间,并且神经发育障碍的逻辑模型在低风险婴儿中过度预测风险,在高风险婴儿中预测不足。
估计二分结局的多个逻辑回归模型可能导致校准不佳的预测。对于具有多个有序类别的结局,连续比例logit回归是标准多项logit回归的一种有用替代方法。它产生更好校准的预测,并且在模型解释方面具有简单性的优点,并且在纳入特定结局类别预测因子和针对医院患者异质性的随机效应项方面具有灵活性。