Institute for Medical Microbiology, Immunology and Hygiene, Technische Universität München, Munich, Germany.
St. Mary's hospital, Department of Medicine, Frankfurt, Germany; Medical Department, Johannes Gutenberg University, Mainz, Germany.
PLoS One. 2014 Mar 21;9(3):e91840. doi: 10.1371/journal.pone.0091840. eCollection 2014.
Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ.
We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes.
We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
许多研究都考察了在多种因素影响下获得的基因表达数据,例如遗传背景、环境条件或疾病暴露。多种因素的相互作用可能导致效应修饰和混杂。更高阶的线性回归模型可以解释这些效应。我们提出了一种新的线性模型选择方法,并将其应用于骨髓来源的巨噬细胞的微阵列数据。该实验研究了三个变量因素的影响:巨噬细胞来源的小鼠的遗传背景、耶尔森氏菌属(两种菌株和一个模拟对照)感染以及干扰素-γ的处理/非处理。
我们按层次顺序建立了四个不同的线性回归模型。我们引入爆发图作为一种新的实用模型选择工具,与全局检验相辅相成。它直观地比较了两个嵌套模型之间效应估计的大小和显著性。使用这种方法,我们能够通过保留仅显示额外解释能力的相关因素来选择最合适的模型。应用于实验数据使我们能够将因素之间的相互作用定性为中性(无相互作用)、缓解(共同发生的效应比单个效应预期的弱)或加重(比预期的强)。我们发现了一个具有生物学意义的假定 C2TA 靶基因的基因簇,这些基因似乎与 MHC 类 II 基因共同调控。
我们引入了爆发图作为一种工具,用于在分析受多种因素影响的表达数据时,直观地比较模型以识别相关的高阶相互作用。我们得出结论,在分析多因素微阵列数据时,通常应在高阶线性回归模型中进行模型选择。