Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China.
Department of Surgical Oncology, Second Affiliated Hospital, Zhejiang University School of Medicine, No. 88 Jiefang Road, Hangzhou, 31009, Zhejiang Province, China.
BMC Med Inform Decis Mak. 2020 Feb 7;20(1):22. doi: 10.1186/s12911-020-1043-1.
Colon cancer is common worldwide and is the leading cause of cancer-related death. Multiple levels of omics data are available due to the development of sequencing technologies. In this study, we proposed an integrative prognostic model for colon cancer based on the integration of clinical and multi-omics data.
In total, 344 patients were included in this study. Clinical, gene expression, DNA methylation and miRNA expression data were retrieved from The Cancer Genome Atlas (TCGA). To accommodate the high dimensionality of omics data, unsupervised clustering was used as dimension reduction method. The bias-corrected Harrell's concordance index was used to verify which clustering result provided the best prognostic performance. Finally, we proposed a prognostic prediction model based on the integration of clinical data and multi-omics data. Uno's concordance index with cross-validation was used to compare the discriminative performance of the prognostic model constructed with different covariates.
Combinations of clinical and multi-omics data can improve prognostic performance, as shown by the increase of the bias-corrected Harrell's concordance of the prognostic model from 0.7424 (clinical features only) to 0.7604 (clinical features and three types of omics features). Additionally, 2-year, 3-year and 5-year Uno's concordance statistics increased from 0.7329, 0.7043, and 0.7002 (clinical features only) to 0.7639, 0.7474 and 0.7597 (clinical features and three types of omics features), respectively.
In conclusion, this study successfully combined clinical and multi-omics data for better prediction of colon cancer prognosis.
结肠癌在全球范围内较为常见,是癌症相关死亡的主要原因。随着测序技术的发展,多种组学数据得以获取。本研究基于临床和多组学数据的整合,提出了一种结肠癌综合预后模型。
共纳入 344 例患者,临床、基因表达、DNA 甲基化和 miRNA 表达数据均来自癌症基因组图谱(TCGA)。为了适应组学数据的高维性,采用无监督聚类作为降维方法。使用校正偏倚的哈雷尔一致性指数来验证哪种聚类结果具有最佳的预后性能。最后,我们基于临床数据和多组学数据的整合,提出了一种预后预测模型。采用 Uno 一致性指数和交叉验证来比较基于不同协变量构建的预后模型的判别性能。
临床和多组学数据的组合可以提高预后性能,这体现在预后模型的校正偏倚哈雷尔一致性从 0.7424(仅临床特征)增加到 0.7604(临床特征和三种组学特征)。此外,2 年、3 年和 5 年 Uno 一致性统计数据从 0.7329、0.7043 和 0.7002(仅临床特征)分别增加到 0.7639、0.7474 和 0.7597(临床特征和三种组学特征)。
总之,本研究成功地将临床和多组学数据相结合,以更好地预测结肠癌的预后。