Turner Jacob A, Bolen Christopher R, Blankenship Derek M
Baylor Research Institute, 3310 Live Oak, Dallas, 75204, TX, USA.
Department of Microbiology and Immunology, Stanford University School, Stanford, 94305, CA, USA.
BMC Bioinformatics. 2015 Aug 28;16:272. doi: 10.1186/s12859-015-0707-9.
Gene set analysis (GSA) of gene expression data can be highly powerful when the biological signal is weak compared to other sources of variability in the data. However, many gene set analysis approaches utilize permutation tests which are not appropriate for complex study designs. For example, the correlation of subjects is broken when comparing time points within a longitudinal study. Linear mixed models provide a method to analyze longitudinal studies as well as adjust for potential confounding factors and account for sources of variability that are not of primary interest. Currently, there are no known gene set analysis approaches that fully account for these study design and analysis aspects. In order to do so, we generalize the QuSAGE gene set analysis algorithm, denoted Q-Gen, and provide the necessary estimation adjustments to incorporate linear mixed model analyses.
We assessed the performance of our generalized method in comparison to the original QuSAGE method in settings such as longitudinal repeated measures analysis and accounting for potential confounders. We demonstrate that the original QuSAGE method can not control for type-I error when these complexities exist. In addition to statistical appropriateness, analysis of a longitudinal influenza study suggests Q-Gen can allow for greater sensitivity when exploring a large number of gene sets.
Q-Gen is an extension to the gene set analysis method of QuSAGE, and allows for linear mixed models to be applied appropriately within a gene set analysis framework. It provides GSA an added layer of flexibility that was not currently available. This flexibility allows for more appropriate statistical modeling of complex data structures that are inherent to many microarray study designs and can provide more sensitivity.
当基因表达数据中的生物学信号与数据中其他变异性来源相比微弱时,基因集分析(GSA)可能会非常强大。然而,许多基因集分析方法利用排列检验,这不适用于复杂的研究设计。例如,在纵向研究中比较时间点时,受试者之间的相关性就会被破坏。线性混合模型提供了一种分析纵向研究的方法,同时可以调整潜在的混杂因素,并考虑那些并非主要关注的变异性来源。目前,尚无已知的基因集分析方法能充分考虑这些研究设计和分析方面的问题。为了做到这一点,我们对QuSAGE基因集分析算法进行了推广,称为Q-Gen,并提供了必要的估计调整以纳入线性混合模型分析。
我们在纵向重复测量分析和考虑潜在混杂因素等情况下,将我们的广义方法与原始的QuSAGE方法的性能进行了评估。我们证明,当存在这些复杂性时,原始的QuSAGE方法无法控制I型错误。除了统计上的合理性外,一项纵向流感研究的分析表明,在探索大量基因集时,Q-Gen可以具有更高的敏感性。
Q-Gen是对QuSAGE基因集分析方法的扩展,它允许在线性混合模型在基因集分析框架内得到适当应用。它为基因集分析提供了一层目前尚不具备的灵活性。这种灵活性允许对许多微阵列研究设计所固有的复杂数据结构进行更合适的统计建模,并可以提供更高的敏感性。