Yin Wenjing, Zhao Sihai Dave, Liang Feng
Department of Statistics, University of Illinois, Urbana-Champaign, Champaign, IL, USA.
Lifetime Data Anal. 2022 Apr;28(2):282-318. doi: 10.1007/s10985-022-09549-5. Epub 2022 Mar 3.
For high dimensional gene expression data, one important goal is to identify a small number of genes that are associated with progression of the disease or survival of the patients. In this paper, we consider the problem of variable selection for multivariate survival data. We propose an estimation procedure for high dimensional accelerated failure time (AFT) models with bivariate censored data. The method extends the Buckley-James method by minimizing a penalized [Formula: see text] loss function with a penalty function induced from a bivariate spike-and-slab prior specification. In the proposed algorithm, censored observations are imputed using the Kaplan-Meier estimator, which avoids a parametric assumption on the error terms. Our empirical studies demonstrate that the proposed method provides better performance compared to the alternative procedures designed for univariate survival data regardless of whether the true events are correlated or not, and conceptualizes a formal way of handling bivariate survival data for AFT models. Findings from the analysis of a myeloma clinical trial using the proposed method are also presented.
对于高维基因表达数据,一个重要目标是识别出少数与疾病进展或患者生存相关的基因。在本文中,我们考虑多元生存数据的变量选择问题。我们提出了一种用于具有双变量删失数据的高维加速失效时间(AFT)模型的估计程序。该方法通过最小化一个惩罚化的[公式:见原文]损失函数来扩展Buckley-James方法,该惩罚函数由双变量尖峰和平板先验规范导出。在所提出的算法中,使用Kaplan-Meier估计器对删失观测值进行插补,这避免了对误差项的参数假设。我们的实证研究表明,无论真实事件是否相关,与为单变量生存数据设计的替代程序相比,所提出的方法都具有更好的性能,并且为处理AFT模型的双变量生存数据概念化了一种正式方法。还展示了使用所提出的方法对骨髓瘤临床试验进行分析的结果。