Bioinformatics Research Center, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA, Division of Bioinformatics, Omicsoft Inc., 200 Cascade Pointe Lane, Suite 101, Cary, NC 27513, USA, Department of Statistics, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA and Department of Statistics, National Cheng-Kung University, No.1, University Road, Tainan 701, TaiwanBioinformatics Research Center, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA, Division of Bioinformatics, Omicsoft Inc., 200 Cascade Pointe Lane, Suite 101, Cary, NC 27513, USA, Department of Statistics, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA and Department of Statistics, National Cheng-Kung University, No.1, University Road, Tainan 701, Taiwan.
Bioinformatics Research Center, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA, Division of Bioinformatics, Omicsoft Inc., 200 Cascade Pointe Lane, Suite 101, Cary, NC 27513, USA, Department of Statistics, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA and Department of Statistics, National Cheng-Kung University, No.1, University Road, Tainan 701, TaiwanBioinformatics Research Center, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA, Division of Bioinformatics, Omicsoft Inc., 200 Cascade Pointe Lane, Suite 101, Cary, NC 27513, USA, Department of Statistics, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA and Department of Statistics, National Cheng-Kung University, No.1, University Road, Tainan 701, TaiwanBioinformatics Research Center, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA, Division of Bioinformatics, Omicsoft Inc., 200 Cascade Pointe Lane, Suite 101, Cary, NC 27513, USA, Department of Statistics, North Carolina State University, Ricks Hall, 1 Lampe Dr., Raleigh, NC 27607, USA and Department of Statistics, National Cheng-Kung University, No.1, University Road, Tainan 701, Taiwan.
Bioinformatics. 2014 Jun 1;30(11):1501-7. doi: 10.1093/bioinformatics/btu060. Epub 2014 Jan 30.
Gene set analysis is a popular method for large-scale genomic studies. Because genes that have common biological features are analyzed jointly, gene set analysis often achieves better power and generates more biologically informative results. With the advancement of technologies, genomic studies with multi-platform data have become increasingly common. Several strategies have been proposed that integrate genomic data from multiple platforms to perform gene set analysis. To evaluate the performances of existing integrative gene set methods under various scenarios, we conduct a comparative simulation analysis based on The Cancer Genome Atlas breast cancer dataset.
We find that existing methods for gene set analysis are less effective when sample heterogeneity exists. To address this issue, we develop three methods for multi-platform genomic data with heterogeneity: two non-parametric methods, multi-platform Mann-Whitney statistics and multi-platform outlier robust T-statistics, and a parametric method, multi-platform likelihood ratio statistics. Using simulations, we show that the proposed multi-platform Mann-Whitney statistics method has higher power for heterogeneous samples and comparable performance for homogeneous samples when compared with the existing methods. Our real data applications to two datasets of The Cancer Genome Atlas also suggest that the proposed methods are able to identify novel pathways that are missed by other strategies.
http://www4.stat.ncsu.edu/∼jytzeng/Software/Multiplatform_gene_set_analysis/
基因集分析是一种用于大规模基因组研究的流行方法。由于共同具有生物学特征的基因被联合分析,因此基因集分析通常能够实现更好的功效,并产生更具生物学意义的结果。随着技术的进步,具有多平台数据的基因组研究变得越来越普遍。已经提出了几种策略来整合来自多个平台的基因组数据以进行基因集分析。为了评估现有整合基因集方法在各种情况下的性能,我们基于 The Cancer Genome Atlas 乳腺癌数据集进行了比较模拟分析。
我们发现,当存在样本异质性时,现有的基因集分析方法效果较差。为了解决这个问题,我们开发了三种用于具有异质性的多平台基因组数据的方法:两种非参数方法,多平台曼-惠特尼统计和多平台异常稳健 T 统计,以及一种参数方法,多平台似然比统计。通过模拟,我们表明,与现有方法相比,所提出的多平台曼-惠特尼统计方法对于异质样本具有更高的功效,并且对于同质样本具有可比的性能。我们对 The Cancer Genome Atlas 的两个数据集的实际应用也表明,所提出的方法能够识别其他策略错过的新途径。
http://www4.stat.ncsu.edu/∼jytzeng/Software/Multiplatform_gene_set_analysis/