Liu Mengque, Zhang Qingzhao, Fang Kuangnan, Ma Shuangge
School of Journalism and New Media, Xi'an Jiaotong University.
School of Economics, Xiamen University.
Comput Stat Data Anal. 2020 Apr;144. doi: 10.1016/j.csda.2019.106883. Epub 2019 Nov 13.
The finite mixture of regression (FMR) model is a popular tool for accommodating data heterogeneity. In the analysis of FMR models with high-dimensional covariates, it is necessary to conduct regularized estimation and identify important covariates rather than noises. In the literature, there has been a lack of attention paid to the differences among important covariates, which can lead to the underlying structure of covariate effects. Specifically, important covariates can be classified into two types: those that behave the same in different subpopulations and those that behave differently. It is of interest to conduct structured analysis to identify such structures, which will enable researchers to better understand covariates and their associations with outcomes. Specifically, the FMR model with high-dimensional covariates is considered. A structured penalization approach is developed for regularized estimation, selection of important variables, and, equally importantly, identification of the underlying covariate effect structure. The proposed approach can be effectively realized, and its statistical properties are rigorously established. Simulation demonstrates its superiority over alternatives. In the analysis of cancer gene expression data, interesting models/structures missed by the existing analysis are identified.
回归有限混合(FMR)模型是处理数据异质性的常用工具。在分析具有高维协变量的FMR模型时,有必要进行正则化估计并识别重要的协变量而非噪声。在文献中,人们对重要协变量之间的差异缺乏关注,而这些差异可能导致协变量效应的潜在结构。具体而言,重要协变量可分为两类:在不同亚群中表现相同的协变量和表现不同的协变量。进行结构化分析以识别此类结构很有意义,这将使研究人员能够更好地理解协变量及其与结果的关联。具体来说,考虑具有高维协变量的FMR模型。开发了一种结构化惩罚方法用于正则化估计、选择重要变量,同样重要的是,识别潜在的协变量效应结构。所提出的方法可以有效实现,并且其统计性质得到严格确立。模拟表明其优于其他方法。在癌症基因表达数据分析中,识别出了现有分析遗漏的有趣模型/结构。