Department of Coloproctology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
Department of Coloproctology, The Sixth Affiliated Hospital of Sun Yat-sen University (Gastrointestinal & Anal Hospital of Sun Yat-sen University), Guangzhou, China.
Mol Genet Genomic Med. 2020 Jul;8(7):e1255. doi: 10.1002/mgg3.1255. Epub 2020 May 12.
As a common malignant tumor in the colon, colon cancer (CC) has high incidence and recurrence rates. This study is designed to build a prognostic model for CC.
The gene expression dataset, microRNA-seq dataset, copy number variation (CNV) dataset, DNA methylation dataset, and transcription factor (TF) dataset of CC were downloaded from UCSC Xena database. Using limma package, the differentially methylated genes (DMGs), and differentially expressed genes (DEGs) and miRNAs (DEMs) were identified. Based on random forest method, prognostic model for each omics dataset were constructed. After the omics features related to prognosis were selected using logrank test, the prognostic model based on multi-omics features was built. Finally, the clinical phenotypes correlated with prognosis were screened using Kaplan-Meier survival analysis, and the nomogram model was established.
There were 1625 DEGs, 268 DEMs, and 386 DMGs between the tumor and normal samples. A total of 105, 29, 159, five, and six genes/sites significantly correlated with prognosis were identified in the gene expression dataset (GABRD), miRNA-seq dataset (miR-1271), CNV dataset (RN7SKP247), DNA methylation dataset (cg09170112 methylation site [located in SFSWAP]), and TF dataset (SIX5), respectively. The prognostic model based on multi-omics features was more effective than those based on single omics dataset. The number of lymph nodes, pathologic_M stage, and pathologic_T stage were the clinical phenotypes correlated with prognosis, based on which the nomogram model was constructed.
The prognostic model based on multi-omics features and the nomogram model might be valuable for the prognostic prediction of CC.
结肠癌(CC)作为结肠的一种常见恶性肿瘤,其发病率和复发率均较高。本研究旨在构建 CC 的预后模型。
从 UCSC Xena 数据库中下载 CC 的基因表达数据集、miRNA-seq 数据集、拷贝数变异(CNV)数据集、DNA 甲基化数据集和转录因子(TF)数据集。使用 limma 包识别差异甲基化基因(DMGs)、差异表达基因(DEGs)和差异表达 miRNA(DEMs)。基于随机森林方法,构建每个组学数据集的预后模型。使用 logrank 检验选择与预后相关的组学特征后,构建基于多组学特征的预后模型。最后,通过 Kaplan-Meier 生存分析筛选与预后相关的临床表型,并建立列线图模型。
肿瘤和正常样本之间有 1625 个 DEGs、268 个 DEMs 和 386 个 DMGs。在基因表达数据集(GABRD)、miRNA-seq 数据集(miR-1271)、CNV 数据集(RN7SKP247)、DNA 甲基化数据集(cg09170112 甲基化位点[位于 SFSWAP])和 TF 数据集(SIX5)中,分别有 105、29、159、5 和 6 个基因/位点与预后显著相关。基于多组学特征的预后模型比基于单个组学数据集的模型更有效。与预后相关的临床表型是淋巴结数量、病理 M 分期和病理 T 分期,基于这些因素构建了列线图模型。
基于多组学特征的预后模型和列线图模型可能对 CC 的预后预测有价值。