Seifert Michael, Abou-El-Ardat Khalil, Friedrich Betty, Klink Barbara, Deutsch Andreas
Center for Information Services and High Performance Computing, Dresden University of Technology, Dresden, Germany.
Institute for Clinical Genetics, Faculty of Medicine Carl Gustav Carus, Dresden University of Technology, Dresden, Germany.
PLoS One. 2014 Jun 23;9(6):e100295. doi: 10.1371/journal.pone.0100295. eCollection 2014.
Changes in gene expression programs play a central role in cancer. Chromosomal aberrations such as deletions, duplications and translocations of DNA segments can lead to highly significant positive correlations of gene expression levels of neighboring genes. This should be utilized to improve the analysis of tumor expression profiles. Here, we develop a novel model class of autoregressive higher-order Hidden Markov Models (HMMs) that carefully exploit local data-dependent chromosomal dependencies to improve the identification of differentially expressed genes in tumor. Autoregressive higher-order HMMs overcome generally existing limitations of standard first-order HMMs in the modeling of dependencies between genes in close chromosomal proximity by the simultaneous usage of higher-order state-transitions and autoregressive emissions as novel model features. We apply autoregressive higher-order HMMs to the analysis of breast cancer and glioma gene expression data and perform in-depth model evaluation studies. We find that autoregressive higher-order HMMs clearly improve the identification of overexpressed genes with underlying gene copy number duplications in breast cancer in comparison to mixture models, standard first- and higher-order HMMs, and other related methods. The performance benefit is attributed to the simultaneous usage of higher-order state-transitions in combination with autoregressive emissions. This benefit could not be reached by using each of these two features independently. We also find that autoregressive higher-order HMMs are better able to identify differentially expressed genes in tumors independent of the underlying gene copy number status in comparison to the majority of related methods. This is further supported by the identification of well-known and of previously unreported hotspots of differential expression in glioblastomas demonstrating the efficacy of autoregressive higher-order HMMs for the analysis of individual tumor expression profiles. Moreover, we reveal interesting novel details of systematic alterations of gene expression levels in known cancer signaling pathways distinguishing oligodendrogliomas, astrocytomas and glioblastomas. An implementation is available under www.jstacs.de/index.php/ARHMM.
基因表达程序的变化在癌症中起着核心作用。诸如DNA片段的缺失、重复和易位等染色体畸变可导致相邻基因的基因表达水平高度显著正相关。应利用这一点来改进肿瘤表达谱的分析。在此,我们开发了一种新型的自回归高阶隐马尔可夫模型(HMM),该模型仔细利用局部数据依赖的染色体依赖性,以改进肿瘤中差异表达基因的识别。自回归高阶HMM通过同时使用高阶状态转移和自回归发射作为新的模型特征,克服了标准一阶HMM在模拟紧密染色体邻近区域基因之间依赖性时普遍存在的局限性。我们将自回归高阶HMM应用于乳腺癌和神经胶质瘤基因表达数据的分析,并进行深入的模型评估研究。我们发现,与混合模型、标准一阶和高阶HMM以及其他相关方法相比,自回归高阶HMM在识别乳腺癌中具有潜在基因拷贝数重复的过表达基因方面有明显改进。性能优势归因于高阶状态转移与自回归发射的同时使用。单独使用这两个特征中的任何一个都无法达到这种优势。我们还发现,与大多数相关方法相比,自回归高阶HMM能够更好地识别肿瘤中差异表达的基因,而与潜在的基因拷贝数状态无关。在胶质母细胞瘤中识别出众所周知和先前未报道的差异表达热点,进一步证明了自回归高阶HMM在分析个体肿瘤表达谱方面的有效性。此外,我们揭示了已知癌症信号通路中基因表达水平系统性改变的有趣新细节,这些细节区分了少突胶质细胞瘤、星形细胞瘤和胶质母细胞瘤。可在www.jstacs.de/index.php/ARHMM上获取实现方法。