Department of Mathematics, University of Texas at Arlington, Arlington, Texas, USA.
Stat Med. 2023 Oct 15;42(23):4111-4127. doi: 10.1002/sim.9850. Epub 2023 Jul 28.
The mixture cure model is widely used to analyze survival data in the presence of a cured subgroup. Standard logistic regression-based approaches to model the incidence may lead to poor predictive accuracy of cure, specifically when the covariate effect is non-linear. Supervised machine learning techniques can be used as a better classifier than the logistic regression due to their ability to capture non-linear patterns in the data. However, the problem of interpret-ability hangs in the balance due to the trade-off between interpret-ability and predictive accuracy. We propose a new mixture cure model where the incidence part is modeled using a decision tree-based classifier and the proportional hazards structure for the latency part is preserved. The proposed model is very easy to interpret, closely mimics the human decision-making process, and provides flexibility to gauge both linear and non-linear covariate effects. For the estimation of model parameters, we develop an expectation maximization algorithm. A detailed simulation study shows that the proposed model outperforms the logistic regression-based and spline regression-based mixture cure models, both in terms of model fitting and evaluating predictive accuracy. An illustrative example with data from a leukemia study is presented to further support our conclusion.
混合治愈模型广泛应用于分析存在治愈亚组的生存数据。基于标准逻辑回归的方法来建立发病率模型可能会导致治愈的预测准确性较差,特别是当协变量的影响是非线性的。由于能够捕捉数据中的非线性模式,监督机器学习技术可用作比逻辑回归更好的分类器。然而,由于可解释性和预测准确性之间的权衡,解释性的问题仍然存在。我们提出了一种新的混合治愈模型,其中发病率部分使用基于决策树的分类器建模,而潜伏期部分保留比例风险结构。所提出的模型非常易于解释,紧密模仿了人类的决策过程,并提供了衡量线性和非线性协变量影响的灵活性。对于模型参数的估计,我们开发了期望最大化算法。详细的模拟研究表明,所提出的模型在模型拟合和评估预测准确性方面均优于基于逻辑回归和样条回归的混合治愈模型。通过白血病研究的数据提供了一个说明性的例子,以进一步支持我们的结论。