Piccolo Stephen R, Frey Lewis J
Department of Pharmacology and Toxicology, University of Utah, 201 Presidents Circle, Salt Lake City, 84112 UT, USA.
Int J Data Min Bioinform. 2013;7(3):245-65. doi: 10.1504/ijdmb.2013.053310.
Glioblastoma multiforme (GBM), a highly aggressive form of brain cancer, results in a median survival of 12-15 months. For decades, researchers have explored the effects of clinical and molecular factors on this disease and have identified several candidate prognostic markers. In this study, we evaluated the use of multivariate classification models for differentiating between subsets of patients who survive a relatively long or short time. Data for this study came from The Cancer Genome Atlas (TCGA), a public repository containing clinical, treatment, histological and biomolecular variables for hundreds of patients. We applied variable-selection and classification algorithms in a cross-validated design and observed that predictive performance of the resulting models varied substantially across the algorithms and categories of data. The best-performing models were based on age, treatments and global DNA methylation. In this paper, we summarise our findings, discuss lessons learned in analysing TCGA data and offer recommendations for performing such analyses.
多形性胶质母细胞瘤(GBM)是一种侵袭性很强的脑癌,其患者的中位生存期为12至15个月。几十年来,研究人员一直在探索临床和分子因素对这种疾病的影响,并确定了几个候选预后标志物。在本研究中,我们评估了使用多变量分类模型来区分生存期相对较长或较短的患者亚组。本研究的数据来自癌症基因组图谱(TCGA),这是一个公共数据库,包含数百名患者的临床、治疗、组织学和生物分子变量。我们在交叉验证设计中应用了变量选择和分类算法,观察到所得模型的预测性能在不同算法和数据类别之间有很大差异。表现最佳的模型基于年龄、治疗方法和整体DNA甲基化。在本文中,我们总结了我们的发现,讨论了在分析TCGA数据过程中吸取的经验教训,并为进行此类分析提供了建议。