Aselisewine Wisdom, Pal Suvra
Department of Mathematics, University of Texas at Arlington, Texas, USA 76019.
Division of Data Science, College of Science, University of Texas at Arlington, Arlington, TX 76019, United States.
Stat Comput. 2024 Aug;34(4). doi: 10.1007/s11222-024-10456-y. Epub 2024 Jun 25.
Cure rate models have been thoroughly investigated across various domains, encompassing medicine, reliability, and finance. The merging of machine learning (ML) with cure models is emerging as a promising strategy to improve predictive accuracy and gain profound insights into the underlying mechanisms influencing the probability of cure. The current body of literature has explored the benefits of incorporating a single ML algorithm with cure models. However, there is a notable absence of a comprehensive study that compares the performances of various ML algorithms in this context. This paper seeks to address and bridge this gap. Specifically, we focus on the well-known mixture cure model and examine the incorporation of five distinct ML algorithms: extreme gradient boosting, neural networks, support vector machines, random forests, and decision trees. To bolster the robustness of our comparison, we also include cure models with logistic and spline-based regression. For parameter estimation, we formulate an expectation maximization algorithm. A comprehensive simulation study is conducted across diverse scenarios to compare various models based on the accuracy and precision of estimates for different quantities of interest, along with the predictive accuracy of cure. The results derived from both the simulation study, as well as the analysis of real cutaneous melanoma data, indicate that the incorporation of ML models into cure model provides a beneficial contribution to the ongoing endeavors aimed at improving the accuracy of cure rate estimation.
治愈率模型已在医学、可靠性和金融等各个领域得到了深入研究。机器学习(ML)与治愈率模型的融合正成为一种有前景的策略,以提高预测准确性,并深入了解影响治愈概率的潜在机制。当前的文献探讨了将单一ML算法与治愈率模型相结合的好处。然而,在此背景下,明显缺乏对各种ML算法性能进行比较的全面研究。本文旨在解决并弥合这一差距。具体而言,我们聚焦于著名的混合治愈率模型,并研究了五种不同ML算法的纳入情况:极端梯度提升、神经网络、支持向量机、随机森林和决策树。为增强比较的稳健性,我们还纳入了基于逻辑回归和样条回归的治愈率模型。对于参数估计,我们制定了一种期望最大化算法。我们在各种不同场景下进行了全面的模拟研究,以根据对不同感兴趣量的估计的准确性和精度以及治愈的预测准确性来比较各种模型。模拟研究以及对真实皮肤黑色素瘤数据的分析结果表明,将ML模型纳入治愈率模型对旨在提高治愈率估计准确性的现有努力有有益贡献。