Faculty of Science, Beijing University of Technology, Beijing, China.
Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA.
Stat Med. 2024 Jan 15;43(1):1-15. doi: 10.1002/sim.9938. Epub 2023 Oct 24.
Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates the patients' molecular profiles with the patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct nonparametric modeling and irrelevant predictors removing simultaneously. In this article, we build a kernel Cox proportional hazards semi-parametric model and propose a novel regularized garrotized kernel machine (RegGKM) method to fit the model. We use the kernel machine method to describe the complex relationship between survival and predictors, while automatically removing irrelevant parametric and nonparametric predictors through a LASSO penalty. An efficient high-dimensional algorithm is developed for the proposed method. Comparison with other competing methods in simulation shows that the proposed method always has better predictive accuracy. We apply this method to analyze a multiple myeloma dataset and predict the patients' death burden based on their gene expressions. Our results can help classify patients into groups with different death risks, facilitating treatment for better clinical outcomes.
癌症患者的生存存在广泛的异质性,从几个月到几十年不等。为了准确预测临床结果,建立一个将患者的分子特征与患者的生存相关联的精确预测模型至关重要。由于生存和高维分子预测因子之间存在复杂的关系,因此同时进行非参数建模和无关预测因子去除具有挑战性。在本文中,我们构建了一个核 Cox 比例风险半参数模型,并提出了一种新的正则化套索核机器(RegGKM)方法来拟合该模型。我们使用核机器方法来描述生存和预测因子之间的复杂关系,同时通过 LASSO 惩罚自动去除无关的参数和非参数预测因子。为所提出的方法开发了一种有效的高维算法。在模拟中与其他竞争方法的比较表明,所提出的方法始终具有更好的预测准确性。我们将该方法应用于多发性骨髓瘤数据集,根据基因表达预测患者的死亡负担。我们的结果可以帮助将患者分为具有不同死亡风险的组,从而促进治疗以获得更好的临床结果。