Lu Rong, Wang Danxin, Wang Min, Rempala Grzegorz A
Bioinformatics Core Facility, Department of Clinical Sciences, University of Texas, Southwestern Medical Center, 5323 Harry Hines Blvd. Dallas, TX 75390.
Center for Pharmacogenomics, College of Medicine, The Ohio State University, 333 W. 10th Avenue, Columbus, OH 43210.
Commun Stat Theory Methods. 2018;47(21):5163-5195. doi: 10.1080/03610926.2017.1388397. Epub 2017 Nov 20.
We derive explicit formulas for Sobol's sensitivity indices (SSIs) under the generalized linear models (GLMs) with independent or multivariate normal inputs. We argue that the main-effect SSIs provide a powerful tool for variable selection under GLMs with identity links under polynomial regressions. We also show via examples that the SSI-based variable selection results are similar to the ones obtained by the random forest algorithm but without the computational burden of data permutation. Finally, applying our results to the problem of gene network discovery, we identify though the SSI analysis of a public microarray dataset several novel higher-order gene-gene interactions missed out by the more standard inference methods. The relevant functions for SSI analysis derived here under GLMs with identity, log, and logit links are implemented and made available in the R package .
我们推导了具有独立或多元正态输入的广义线性模型(GLM)下索伯尔灵敏度指数(SSI)的显式公式。我们认为,主效应SSI为多项式回归下具有恒等链接的GLM中的变量选择提供了一个强大的工具。我们还通过示例表明,基于SSI的变量选择结果与通过随机森林算法获得的结果相似,但没有数据置换的计算负担。最后,将我们的结果应用于基因网络发现问题,我们通过对一个公共微阵列数据集的SSI分析,识别出了一些被更标准的推断方法遗漏的新型高阶基因-基因相互作用。这里在具有恒等、对数和对数几率链接的GLM下推导的用于SSI分析的相关函数已在R包中实现并可用。