Nguyen Phuong, Zeng Erliang
Division of Biostatistics and Computational Biology, College of Dentistry and Dental Clinics, University of Iowa, Iowa City, IA, USA.
Informatics Graduate Program, University of Iowa, Iowa City, IA, USA.
Bio Protoc. 2025 Sep 20;15(18):e5447. doi: 10.21769/BioProtoc.5447.
Weighted gene co-expression network analysis (WGCNA) is widely used in transcriptomic studies to identify groups of highly correlated genes, aiding in the understanding of disease mechanisms. Although numerous protocols exist for constructing WGCNA networks from gene expression data, many focus on single datasets and do not address how to compare module stability across conditions. Here, we present a protocol for constructing and comparing WGCNA modules in paired tumor and normal datasets, enabling the identification of modules involved in both core biological processes and those specifically related to cancer pathogenesis. By incorporating module preservation analysis, this approach allows researchers to gain deeper insights into the molecular underpinnings of oral cancer, as well as other diseases. Overall, this protocol provides a framework for module preservation analysis in paired datasets, enabling researchers to identify which gene co-expression modules are conserved or disrupted between conditions, thereby advancing our understanding of disease-specific vs. universal biological processes. Key features • Presents a step-by-step WGCNA protocol with module preservation and functional enrichment analysis [1,2] using TCGA cancer data, demonstrating network differences between normal and tumor tissues. • Preprocesses gene expression data and conducts downstream analysis for constructed networks. • Requires 2-3 h hands-on time and 8-12 h total computational time, depending on dataset size and permutation number used for module preservation analysis.
加权基因共表达网络分析(WGCNA)在转录组学研究中被广泛应用,用于识别高度相关的基因群组,有助于理解疾病机制。尽管存在许多从基因表达数据构建WGCNA网络的方案,但许多方案侧重于单个数据集,并未解决如何跨条件比较模块稳定性的问题。在此,我们提出了一种在配对的肿瘤和正常数据集中构建和比较WGCNA模块的方案,能够识别参与核心生物学过程以及与癌症发病机制特别相关的模块。通过纳入模块保留分析,这种方法使研究人员能够更深入地了解口腔癌以及其他疾病的分子基础。总体而言,该方案为配对数据集中的模块保留分析提供了一个框架,使研究人员能够识别哪些基因共表达模块在不同条件下是保守的或被破坏的,从而推进我们对疾病特异性与普遍生物学过程的理解。关键特性 • 展示了一个使用TCGA癌症数据进行模块保留和功能富集分析的分步WGCNA方案[1,2],展示了正常组织和肿瘤组织之间的网络差异。 • 对基因表达数据进行预处理,并对构建的网络进行下游分析。 • 根据数据集大小和用于模块保留分析的排列数,实际操作时间需要2 - 3小时,总计算时间需要8 - 12小时。