Presson Angela P, Sobel Eric M, Papp Jeanette C, Suarez Charlyn J, Whistler Toni, Rajeevan Mangalathu S, Vernon Suzanne D, Horvath Steve
Biostatistics, University of California, Los Angeles, CA, USA.
BMC Syst Biol. 2008 Nov 6;2:95. doi: 10.1186/1752-0509-2-95.
Systems biologic approaches such as Weighted Gene Co-expression Network Analysis (WGCNA) can effectively integrate gene expression and trait data to identify pathways and candidate biomarkers. Here we show that the additional inclusion of genetic marker data allows one to characterize network relationships as causal or reactive in a chronic fatigue syndrome (CFS) data set.
We combine WGCNA with genetic marker data to identify a disease-related pathway and its causal drivers, an analysis which we refer to as "Integrated WGCNA" or IWGCNA. Specifically, we present the following IWGCNA approach: 1) construct a co-expression network, 2) identify trait-related modules within the network, 3) use a trait-related genetic marker to prioritize genes within the module, 4) apply an integrated gene screening strategy to identify candidate genes and 5) carry out causality testing to verify and/or prioritize results. By applying this strategy to a CFS data set consisting of microarray, SNP and clinical trait data, we identify a module of 299 highly correlated genes that is associated with CFS severity. Our integrated gene screening strategy results in 20 candidate genes. We show that our approach yields biologically interesting genes that function in the same pathway and are causal drivers for their parent module. We use a separate data set to replicate findings and use Ingenuity Pathways Analysis software to functionally annotate the candidate gene pathways.
We show how WGCNA can be combined with genetic marker data to identify disease-related pathways and the causal drivers within them. The systems genetics approach described here can easily be used to generate testable genetic hypotheses in other complex disease studies.
诸如加权基因共表达网络分析(WGCNA)等系统生物学方法能够有效地整合基因表达和性状数据,以识别通路和候选生物标志物。在此,我们表明额外纳入遗传标记数据能够在慢性疲劳综合征(CFS)数据集中将网络关系表征为因果关系或反应性。
我们将WGCNA与遗传标记数据相结合,以识别与疾病相关的通路及其因果驱动因素,这种分析我们称为“整合WGCNA”或IWGCNA。具体而言,我们提出以下IWGCNA方法:1)构建共表达网络;2)在网络中识别与性状相关的模块;3)使用与性状相关的遗传标记对模块内的基因进行优先级排序;4)应用整合基因筛选策略来识别候选基因;5)进行因果关系测试以验证和/或对结果进行优先级排序。通过将此策略应用于一个由微阵列、单核苷酸多态性(SNP)和临床性状数据组成的CFS数据集,我们识别出一个由299个高度相关基因组成的模块,该模块与CFS严重程度相关。我们的整合基因筛选策略产生了20个候选基因。我们表明我们的方法产生了在同一通路中起作用且是其母模块因果驱动因素的具有生物学意义的基因。我们使用一个单独的数据集来重复研究结果,并使用Ingenuity Pathways Analysis软件对候选基因通路进行功能注释。
我们展示了WGCNA如何与遗传标记数据相结合来识别与疾病相关的通路及其内部的因果驱动因素。此处描述的系统遗传学方法可轻松用于在其他复杂疾病研究中生成可检验的遗传假设。