FIMM, Institute of Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.
PLoS One. 2011 Feb 18;6(2):e17259. doi: 10.1371/journal.pone.0017259.
Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type ('outlier genes'), a hallmark of potential oncogenes.
A new statistical method (the gene tissue index, GTI) was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29) of these genes, and 17 of these 19 genes (90%) showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target.
CONCLUSIONS/SIGNIFICANCE: Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is implemented in an R package (Text S1).
基因表达微阵列数据集的荟萃分析对统计分析提出了重大挑战。我们开发并验证了一种新的生物信息学方法,用于鉴定给定肿瘤类型样本中上调的基因(“异常基因”),这是潜在致癌基因的标志。
通过修改和调整最初为经济学中的统计问题开发的算法,开发了一种新的统计方法(基因组织指数,GTI)。我们使用模拟数据将 GTI 与之前定义的四种统计方法(COPA、OS 统计、t 检验和 ORT)进行比较,以检测荟萃数据集的异常基因。我们证明 GTI 在单个研究模拟中与现有方法的性能相当。接下来,我们评估了 GTI 在分析来自几个已发表研究的组合 Affymetrix 基因表达数据中的性能,这些研究涵盖了中枢神经系统的 392 个正常组织样本、74 个星形细胞瘤和 353 个胶质母细胞瘤。结果表明,GTI 比大多数先前的方法更能识别已知的致癌异常基因。此外,GTI 在胶质母细胞瘤中鉴定了 29 个新的异常基因,包括 TYMS 和 CDKN2A。这些基因的过表达通过来自临床胶质母细胞瘤样本的免疫组织化学染色数据在体内得到验证。这些基因中有 65%(29 个中的 19 个)具有免疫组织化学数据,其中 17 个(90%)显示出典型的异常染色模式。此外,培美曲塞,一种用于治疗非胶质母细胞瘤肿瘤类型的 TYMS 的特异性抑制剂,也能有效抑制胶质母细胞瘤细胞系的细胞增殖,从而突出了该异常基因候选物作为潜在治疗靶点的重要性。
结论/意义:总的来说,这些结果支持 GTI 作为一种新的方法来鉴定潜在的致癌基因异常和药物靶点。该算法在 R 包中实现(Text S1)。