Utazirubanda Jean Claude, Leon Tomas, Ngom Papa
LMA,Université Cheikh Anta Diop, Dakar, Senegal.
School of Public Health, University of California, Berkeley, USA.
Commun Stat Simul Comput. 2021;50(3):881-901. doi: 10.1080/03610918.2019.1571605. Epub 2018 Feb 28.
In analysis of survival outcomes supplemented with both clinical information and high-dimensional gene expression data, use of the traditional Cox proportional hazards model fails to meet some emerging needs in biomedical research. First, the number of covariates is generally much larger the sample size. Secondly, predicting an outcome based on individual gene expression is inadequate because multiple biological processes and functional pathways regulate phenotypic expression. Another challenge is that the Cox model assumes that populations are homogenous, implying that all individuals have the same risk of death, which is rarely true due to unmeasured risk factors among populations. In this paper we propose group LASSO with gamma-distributed frailty for variable selection in Cox regression by extending previous scholarship to account for heterogeneity among group structures related to exposure and susceptibility. The consistency property of the proposed method is established. This method is appropriate for addressing a wide variety of research questions from genetics to air pollution. Simulated and real world data analysis shows promising performance by group LASSO compared with other methods, including group SCAD and group MCP. Future research directions include expanding the use of frailty with adaptive group LASSO and sparse group LASSO methods.
在对补充了临床信息和高维基因表达数据的生存结果进行分析时,使用传统的Cox比例风险模型无法满足生物医学研究中一些新出现的需求。首先,协变量的数量通常比样本量要大得多。其次,基于单个基因表达来预测结果是不够的,因为多个生物过程和功能通路调节表型表达。另一个挑战是,Cox模型假定总体是同质的,这意味着所有个体具有相同的死亡风险,但由于总体中存在未测量的风险因素,这很少是真的。在本文中,我们通过扩展先前的研究成果以考虑与暴露和易感性相关的组结构之间的异质性,提出了用于Cox回归中变量选择的具有伽马分布脆弱性的组套索方法。建立了所提方法的一致性性质。该方法适用于解决从遗传学到空气污染等各种各样的研究问题。模拟和实际数据分析表明,与包括组SCAD和组MCP在内的其他方法相比,组套索方法具有良好的性能。未来的研究方向包括将脆弱性与自适应组套索和稀疏组套索方法结合起来进行扩展应用。