Seemann Lars, Shulman Jason, Gunaratne Gemunu H
Department of Physics, University of Houston, Houston, TX 77204, USA.
Department of Physics, Richard Stockton College of New Jersey, Pomona, NJ 08240, USA.
ISRN Bioinform. 2012 Nov 11;2012:381023. doi: 10.5402/2012/381023. eCollection 2012.
Early and accurate diagnoses of cancer can significantly improve the design of personalized therapy and enhance the success of therapeutic interventions. Histopathological approaches, which rely on microscopic examinations of malignant tissue, are not conducive to timely diagnoses. High throughput genomics offers a possible new classification of cancer subtypes. Unfortunately, most clustering algorithms have not been proven sufficiently robust. We propose a novel approach that relies on the use of statistical invariants and persistent homology, one of the most exciting recent developments in topology. It identifies a sufficient but compact set of genes for the analysis as well as a core group of tightly correlated patient samples for each subtype. Partitioning occurs hierarchically and allows for the identification of genetically similar subtypes. We analyzed the gene expression profiles of 202 tumors of the brain cancer glioblastoma multiforme (GBM) given at the Cancer Genome Atlas (TCGA) site. We identify core patient groups associated with the classical, mesenchymal, and proneural subtypes of GBM. In our analysis, the neural subtype consists of several small groups rather than a single component. A subtype prediction model is introduced which partitions tumors in a manner consistent with clustering algorithms but requires the genetic signature of only 59 genes.
癌症的早期准确诊断能够显著改善个性化治疗方案的设计,并提高治疗干预的成功率。依靠对恶性组织进行显微镜检查的组织病理学方法不利于及时诊断。高通量基因组学为癌症亚型提供了一种可能的新分类方法。不幸的是,大多数聚类算法尚未被证明具有足够的鲁棒性。我们提出了一种新颖的方法,该方法依赖于统计不变量和持久同调的使用,持久同调是拓扑学中最近最令人兴奋的发展之一。它为分析确定了一组足够但紧凑的基因,以及每个亚型的一组紧密相关的核心患者样本。划分是分层进行的,并且允许识别基因相似的亚型。我们分析了癌症基因组图谱(TCGA)网站上给出的202个多形性胶质母细胞瘤(GBM)脑肿瘤的基因表达谱。我们确定了与GBM的经典、间充质和神经前体亚型相关的核心患者组。在我们的分析中,神经亚型由几个小群体组成,而不是单个成分。引入了一种亚型预测模型,该模型以与聚类算法一致的方式对肿瘤进行划分,但只需要59个基因的基因特征。