Ren Jie, Du Yinhao, Li Shaoyu, Ma Shuangge, Jiang Yu, Wu Cen
Department of Statistics, Kansas State University, Manhattan, Kansas.
Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, North Carolina.
Genet Epidemiol. 2019 Apr;43(3):276-291. doi: 10.1002/gepi.22194. Epub 2019 Feb 11.
In cancer genomic studies, an important objective is to identify prognostic markers associated with patients' survival. Network-based regularization has achieved success in variable selections for high-dimensional cancer genomic data, because of its ability to incorporate the correlations among genomic features. However, as survival time data usually follow skewed distributions, and are contaminated by outliers, network-constrained regularization that does not take the robustness into account leads to false identifications of network structure and biased estimation of patients' survival. In this study, we develop a novel robust network-based variable selection method under the accelerated failure time model. Extensive simulation studies show the advantage of the proposed method over the alternative methods. Two case studies of lung cancer datasets with high-dimensional gene expression measurements demonstrate that the proposed approach has identified markers with important implications.
在癌症基因组研究中,一个重要目标是识别与患者生存相关的预后标志物。基于网络的正则化方法在高维癌症基因组数据的变量选择中取得了成功,因为它能够纳入基因组特征之间的相关性。然而,由于生存时间数据通常遵循偏态分布,并且受到异常值的影响,未考虑稳健性的网络约束正则化会导致网络结构的错误识别以及患者生存的偏差估计。在本研究中,我们在加速失效时间模型下开发了一种新颖的基于稳健网络的变量选择方法。大量模拟研究表明了所提出方法相对于其他方法的优势。对具有高维基因表达测量的肺癌数据集进行的两个案例研究表明,所提出的方法识别出了具有重要意义的标志物。