Li Yanming, Nan Bin, Zhu Ji
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, 48109, U.S.A.
Department of Statistics, University of Michigan, Ann Arbor, Michigan, 48109, U.S.A.
Biometrics. 2015 Jun;71(2):354-63. doi: 10.1111/biom.12292. Epub 2015 Mar 2.
We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study.
我们针对具有高维预测变量以及高维响应变量的数据,提出了一种多变量稀疏组套索变量选择与估计方法。该方法通过一个对回归系数矩阵具有任意组结构的惩罚多变量多元线性回归模型来实现。它非常适合许多生物学研究,用于检测多个性状与多个预测变量之间的关联,其中每个性状和每个预测变量都嵌入在一些生物学功能组中,如基因、通路或脑区。该方法能够有效去除不重要的组以及重要组内不重要的个体系数,尤其适用于大p小n问题,并且在处理各种复杂的组结构(如重叠、嵌套或多级层次结构)时具有灵活性。通过与传统套索和组套索方法进行比较的广泛模拟对该方法进行了评估,并将其应用于一项eQTL关联研究。