Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.
Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran.
Comput Math Methods Med. 2021 Sep 19;2021:5169052. doi: 10.1155/2021/5169052. eCollection 2021.
Variable selection and penalized regression models in high-dimension settings have become an increasingly important topic in many disciplines. For instance, omics data are generated in biomedical researches that may be associated with survival of patients and suggest insights into disease dynamics to identify patients with worse prognosis and to improve the therapy. Analysis of high-dimensional time-to-event data in the presence of competing risks requires special modeling techniques. So far, some attempts have been made to variable selection in low- and high-dimension competing risk setting using partial likelihood-based procedures. In this paper, a weighted likelihood-based penalized approach is extended for direct variable selection under the subdistribution hazards model for high-dimensional competing risk data. The proposed method which considers a larger class of semiparametric regression models for the subdistribution allows for taking into account time-varying effects and is of particular importance, because the proportional hazards assumption may not be valid in general, especially in the high-dimension setting. Also, this model relaxes from the constraint of the ability to simultaneously model multiple cumulative incidence functions using the Fine and Gray approach. The performance/effectiveness of several penalties including minimax concave penalty (MCP); adaptive LASSO and smoothly clipped absolute deviation (SCAD) as well as their L counterparts were investigated through simulation studies in terms of sensitivity/specificity. The results revealed that sensitivity of all penalties were comparable, but the MCP and MCP-L penalties outperformed the other methods in term of selecting less noninformative variables. The practical use of the model was investigated through the analysis of genomic competing risk data obtained from patients with bladder cancer and six genes of CDC20, NCF2, SMARCAD1, RTN4, ETFDH, and SON were identified using all the methods and were significantly correlated with the subdistribution.
在高维环境中,变量选择和惩罚回归模型已经成为许多学科中越来越重要的话题。例如,生物医学研究中产生的组学数据可能与患者的生存有关,并为了解疾病动态、识别预后较差的患者以及改善治疗提供了线索。在存在竞争风险的情况下,对高维时变数据进行分析需要特殊的建模技术。到目前为止,已经有一些尝试使用基于部分似然的程序对低维和高维竞争风险环境中的变量选择进行了尝试。在本文中,我们扩展了基于加权似然的惩罚方法,以直接对高维竞争风险数据的子分布风险模型进行变量选择。所提出的方法考虑了一个更大的半参数回归模型类,用于子分布,因此可以考虑时变效应,这一点非常重要,因为在一般情况下,特别是在高维环境中,比例风险假设可能不成立。此外,该模型还放宽了 Fine 和 Gray 方法的约束,即同时对多个累积发生率函数进行建模的能力。通过模拟研究,从灵敏度/特异性方面研究了包括最小最大凹罚(MCP);自适应 LASSO 和光滑裁剪绝对偏差(SCAD)以及它们的 L 对应物在内的几种惩罚的性能/有效性。结果表明,所有惩罚的灵敏度都相当,但在选择信息量较少的变量方面,MCP 和 MCP-L 惩罚优于其他方法。通过对从膀胱癌患者获得的基因组竞争风险数据的分析,研究了模型的实际应用,并使用所有方法鉴定了 CDC20、NCF2、SMARCAD1、RTN4、ETFDH 和 SON 这 6 个基因,它们与子分布显著相关。