Luo Bin, Halabi Susan
School of Data Science and Analytics, Kennesaw State University, Kennesaw, GA 30144, USA.
Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USA.
Bioengineering (Basel). 2025 May 31;12(6):596. doi: 10.3390/bioengineering12060596.
This study addresses the problem of simultaneous variable selection and model estimation in multivariate failure time data, a common challenge in clinical trials with multiple correlated time-to-event endpoints. We propose a unified framework that identifies predictors shared across outcomes, applicable to both low- and high-dimensional settings. For linear marginal hazard models, we develop a penalized pseudo-partial likelihood approach with a group LASSO-type penalty applied to the ℓ2 norms of coefficients corresponding to the same covariates across marginal hazard functions. To capture potential nonlinear effects, we further extend the approach to a sparse-input neural network model with structured group penalties on input-layer weights. Both methods are optimized using a composite gradient descent algorithm combining standard gradient steps with proximal updates. Simulation studies demonstrate that the proposed methods yield superior variable selection and predictive performance compared to traditional and outcome-specific approaches, while remaining robust to violations of the common predictor assumption. In an application to advanced prostate cancer data, the framework identifies both established clinical factors and potentially novel prognostic single-nucleotide polymorphisms for overall and progression-free survival. This work provides a flexible and robust tool for analyzing complex multivariate survival data, with potential utility in prognostic modeling and personalized medicine.
本研究解决了多变量失效时间数据中的同时变量选择和模型估计问题,这是具有多个相关事件发生时间终点的临床试验中的一个常见挑战。我们提出了一个统一的框架,该框架可识别不同结局之间共享的预测因子,适用于低维和高维设置。对于线性边际风险模型,我们开发了一种惩罚伪偏似然方法,对跨边际风险函数对应于相同协变量的系数的ℓ2范数应用组LASSO型惩罚。为了捕捉潜在的非线性效应,我们进一步将该方法扩展到具有输入层权重结构化组惩罚的稀疏输入神经网络模型。两种方法都使用结合标准梯度步长和近端更新的复合梯度下降算法进行优化。模拟研究表明,与传统方法和特定结局方法相比,所提出的方法具有更好的变量选择和预测性能,同时对违反共同预测因子假设具有鲁棒性。在一项对晚期前列腺癌数据的应用中,该框架识别出了用于总生存期和无进展生存期的既定临床因素以及潜在的新型预后单核苷酸多态性。这项工作为分析复杂的多变量生存数据提供了一个灵活且稳健的工具,在预后建模和个性化医疗中具有潜在应用价值。