Shi Xingjie, Shen Shihao, Liu Jin, Huang Jian, Zhou Yong, Ma Shuangge
Brief Bioinform. 2014 Sep;15(5):671-84. doi: 10.1093/bib/bbt044. Epub 2013 Jun 19.
Gene expression profiling has been extensively conducted in cancer research. The analysis of multiple independent cancer gene expression datasets may provide additional information and complement single-dataset analysis. In this study, we conduct multi-dataset analysis and are interested in evaluating the similarity of cancer-associated genes identified from different datasets. The first objective of this study is to briefly review some statistical methods that can be used for such evaluation. Both marginal analysis and joint analysis methods are reviewed. The second objective is to apply those methods to 26 Gene Expression Omnibus (GEO) datasets on five types of cancers. Our analysis suggests that for the same cancer, the marker identification results may vary significantly across datasets, and different datasets share few common genes. In addition, datasets on different cancers share few common genes. The shared genetic basis of datasets on the same or different cancers, which has been suggested in the literature, is not observed in the analysis of GEO data.
基因表达谱分析在癌症研究中已被广泛开展。对多个独立的癌症基因表达数据集进行分析可能会提供额外信息并补充单数据集分析。在本研究中,我们进行多数据集分析,并对评估从不同数据集中鉴定出的癌症相关基因的相似性感兴趣。本研究的首要目标是简要回顾一些可用于此类评估的统计方法。我们对边际分析和联合分析方法都进行了回顾。第二个目标是将这些方法应用于关于五种癌症的26个基因表达综合数据库(GEO)数据集。我们的分析表明,对于同一种癌症,不同数据集的标志物鉴定结果可能差异显著,且不同数据集共享的基因很少。此外,不同癌症的数据集共享的基因也很少。在GEO数据分析中未观察到文献中所提及的同一或不同癌症数据集的共享遗传基础。