Center for Systems and Computational Biology, Molecular and Cellular Oncogenesis Program, The Wistar Institute, Philadelphia, PA 19104, USA.
Department of Surgery, Abramson Cancer Center, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.
Genome Med. 2013 Apr 17;5(4):33. doi: 10.1186/gm437. eCollection 2013.
The majority of mammalian genes generate multiple transcript variants and protein isoforms through alternative transcription and/or alternative splicing, and the dynamic changes at the transcript/isoform level between non-oncogenic and cancer cells remain largely unexplored. We hypothesized that isoform level expression profiles would be better than gene level expression profiles at discriminating between non-oncogenic and cancer cellsgene level.
We analyzed 160 Affymetrix exon-array datasets, comprising cell lines of non-oncogenic or oncogenic tissue origins. We obtained the transcript-level and gene level expression estimates, and used unsupervised and supervised clustering algorithms to study the profile similarity between the samples at both gene and isoform levels.
Hierarchical clustering, based on isoform level expressions, effectively grouped the non-oncogenic and oncogenic cell lines with a virtually perfect homogeneity-grouping rate (97.5%), regardless of the tissue origin of the cell lines. However, gene levelthis rate was much lower, being 75% at best based on the gene level expressions. Statistical analyses of the difference between cancer and non-oncogenic samples identified the existence of numerous genes with differentially expressed isoforms, which otherwise were not significant at the gene level. We also found that canonical pathways of protein ubiquitination, purine metabolism, and breast-cancer regulation by stathmin1 were significantly enriched among genes thatshow differential expression at isoform level but not at gene level.
In summary, cancer cell lines, regardless of their tissue of origin, can be effectively discriminated from non-cancer cell lines at isoform level, but not at gene level. This study suggests the existence of an isoform signature, rather than a gene signature, which could be used to distinguish cancer cells from normal cells.
大多数哺乳动物基因通过选择性转录和/或选择性剪接产生多种转录变体和蛋白质同工型,而非致癌细胞与癌细胞之间转录/同工型水平的动态变化在很大程度上仍未被探索。我们假设同工型水平的表达谱比基因水平的表达谱更能区分非致癌细胞和癌细胞。
我们分析了 160 个 Affymetrix 外显子芯片数据集,其中包括非致癌或致癌组织来源的细胞系。我们获得了转录水平和基因水平的表达估计值,并使用无监督和监督聚类算法研究了基因和同工型水平上样本之间的相似性。
基于同工型水平表达的层次聚类,有效地将非致癌和致癌细胞系进行了分组,几乎达到了完美的同组率(97.5%),而与细胞系的组织来源无关。然而,基因水平的同组率要低得多,最佳情况下基于基因水平表达的同组率仅为 75%。对癌症和非致癌样本之间差异的统计分析确定了许多具有差异表达同工型的基因的存在,否则这些基因在基因水平上并不显著。我们还发现,蛋白质泛素化、嘌呤代谢和 stathmin1 调节乳腺癌的经典途径在同工型水平上差异表达的基因中显著富集,而在基因水平上则没有显著富集。
总之,无论其组织来源如何,癌症细胞系都可以在同工型水平而不是基因水平上有效地与非癌细胞系区分开来。这项研究表明,存在同工型特征,而不是基因特征,可以用于区分癌细胞和正常细胞。