South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Cape Town, South Africa.
Department of Statistics and Population Studies, University of the Western Cape, Cape Town, South Africa.
PLoS One. 2023 Apr 24;18(4):e0284458. doi: 10.1371/journal.pone.0284458. eCollection 2023.
Cancer progression can be tracked by gene expression changes that occur throughout early-stage to advanced-stage cancer development. The accumulated genetic changes can be detected when gene expression levels in advanced-stage are less variable but show high variability in early-stage. Normalizing advanced-stage expression samples with early-stage and clustering of the normalized expression samples can reveal cancers with similar or different progression and provide insight into clinical and phenotypic patterns of patient samples within the same cancer.
This study aims to investigate cancer progression through RNA-Seq expression profiles across the multi-stage process of cancer development.
RNA-sequenced gene expression of Diffuse Large B-cell Lymphoma, Lung cancer, Liver cancer, Cervical cancer, and Testicular cancer were downloaded from the UCSC Xena database. Advanced-stage samples were normalized with early-stage samples to consider heterogeneity differences in the multi-stage cancer progression. WGCNA was used to build a gene network and categorized normalized genes into different modules. A gene set enrichment analysis selected key gene modules related to cancer. The diagnostic capacity of the modules was evaluated after hierarchical clustering.
Unnormalized RNA-Seq gene expression failed to segregate advanced-stage samples based on selected cancer cohorts. Normalization with early-stage revealed the true heterogeneous gene expression that accumulates across the multi-stage cancer progression, this resulted in well segregated cancer samples. Cancer-specific pathways were enriched in the normalized WGCNA modules. The normalization method was further able to stratify patient samples based on phenotypic and clinical information. Additionally, the method allowed for patient survival analysis, with the Cox regression model selecting gene MAP4K1 in cervical cancer and Kaplan-Meier confirming that upregulation is favourable.
The application of the normalization method further enhanced the accuracy of clustering of cancer samples based on how they progressed. Additionally, genes responsible for cancer progression were discovered.
癌症的进展可以通过整个早期到晚期癌症发展过程中发生的基因表达变化来跟踪。当晚期的基因表达水平变化较小但早期的基因表达水平变化较大时,可以检测到累积的遗传变化。用早期阶段的样本对晚期样本进行标准化,并对标准化后的表达样本进行聚类,可以揭示具有相似或不同进展的癌症,并深入了解同一癌症患者样本的临床和表型模式。
本研究旨在通过跨癌症发展多阶段过程的 RNA-Seq 表达谱研究癌症进展。
从 UCSC Xena 数据库下载弥漫性大 B 细胞淋巴瘤、肺癌、肝癌、宫颈癌和睾丸癌的 RNA-seq 基因表达数据。用早期样本对晚期样本进行标准化,以考虑多阶段癌症进展中的异质性差异。使用 WGCNA 构建基因网络,并将标准化基因分为不同的模块。基因集富集分析选择与癌症相关的关键基因模块。在层次聚类后评估模块的诊断能力。
未标准化的 RNA-Seq 基因表达未能根据选定的癌症队列对晚期样本进行分类。用早期样本进行标准化揭示了在多阶段癌症进展中积累的真实异质基因表达,这导致了癌症样本的良好分离。正常化 WGCNA 模块中富集了癌症特异性途径。该标准化方法还能够根据表型和临床信息对患者样本进行分层。此外,该方法还允许进行患者生存分析,Cox 回归模型选择了宫颈癌中的基因 MAP4K1,Kaplan-Meier 进一步证实上调是有利的。
该标准化方法的应用进一步提高了根据癌症进展情况对癌症样本聚类的准确性。此外,还发现了与癌症进展相关的基因。