Department of Epidemiology & Biostatistics, Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, PO Box 7057, 1007 MB Amsterdam, The Netherlands and Mathematical Institute, Leiden University, PO Box 9512, 2300 RA Leiden, The Netherlands.
Department of Epidemiology & Biostatistics, Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, PO Box 7057, 1007 MB Amsterdam, The Netherlands.
Biostatistics. 2021 Oct 13;22(4):723-737. doi: 10.1093/biostatistics/kxz062.
In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (i) $p$-values from a previous study and (ii) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection but is not straightforward. We propose a group-regularized (logistic) elastic net regression method, where each penalty parameter corresponds to a group of features based on the external information. The method, termed gren, makes use of the Bayesian formulation of logistic elastic net regression to estimate both the model and penalty parameters in an approximate empirical-variational Bayes framework. Simulations and applications to three cancer genomics studies and one Alzheimer metabolomics study show that, if the partitioning of the features is informative, classification performance, and feature selection are indeed enhanced.
在高维数据环境中,通常可以获得有关特征的其他信息。组学研究中此类外部信息的示例包括:(i) 先前研究的 $p$ 值,以及 (ii) 组学注释。在分析中包含这些信息可能会提高分类性能和特征选择,但并不简单。我们提出了一种基于组正则化(逻辑)弹性网络回归的方法,其中每个惩罚参数都基于外部信息对应一组特征。该方法称为 gren,它利用逻辑弹性网络回归的贝叶斯公式在近似经验变分贝叶斯框架中同时估计模型和惩罚参数。模拟和对三个癌症基因组学研究和一个阿尔茨海默病代谢组学研究的应用表明,如果特征的划分是有意义的,那么分类性能和特征选择确实会得到增强。