Suppr超能文献

基因集分析方法的比较评估

Comparative evaluation of gene-set analysis methods.

作者信息

Liu Qi, Dinu Irina, Adewale Adeniyi J, Potter John D, Yasui Yutaka

机构信息

School of Public Health, University of Alberta, Edmonton, Alberta, T6G2G3, Canada.

出版信息

BMC Bioinformatics. 2007 Nov 7;8:431. doi: 10.1186/1471-2105-8-431.

Abstract

BACKGROUND

Multiple data-analytic methods have been proposed for evaluating gene-expression levels in specific biological pathways, assessing differential expression associated with a binary phenotype. Following Goeman and Bühlmann's recent review, we compared statistical performance of three methods, namely Global Test, ANCOVA Global Test, and SAM-GS, that test "self-contained null hypotheses" Via. subject sampling. The three methods were compared based on a simulation experiment and analyses of three real-world microarray datasets.

RESULTS

In the simulation experiment, we found that the use of the asymptotic distribution in the two Global Tests leads to a statistical test with an incorrect size. Specifically, p-values calculated by the scaled chi2 distribution of Global Test and the asymptotic distribution of ANCOVA Global Test are too liberal, while the asymptotic distribution with a quadratic form of the Global Test results in p-values that are too conservative. The two Global Tests with permutation-based inference, however, gave a correct size. While the three methods showed similar power using permutation inference after a proper standardization of gene expression data, SAM-GS showed slightly higher power than the Global Tests. In the analysis of a real-world microarray dataset, the two Global Tests gave markedly different results, compared to SAM-GS, in identifying pathways whose gene expressions are associated with p53 mutation in cancer cell lines. A proper standardization of gene expression variances is necessary for the two Global Tests in order to produce biologically sensible results. After the standardization, the three methods gave very similar biologically-sensible results, with slightly higher statistical significance given by SAM-GS. The three methods gave similar patterns of results in the analysis of the other two microarray datasets.

CONCLUSION

An appropriate standardization makes the performance of all three methods similar, given the use of permutation-based inference. SAM-GS tends to have slightly higher power in the lower alpha-level region (i.e. gene sets that are of the greatest interest). Global Test and ANCOVA Global Test have the important advantage of being able to analyze continuous and survival phenotypes and to adjust for covariates. A free Microsoft Excel Add-In to perform SAM-GS is available from http://www.ualberta.ca/~yyasui/homepage.html.

摘要

背景

已经提出了多种数据分析方法来评估特定生物途径中的基因表达水平,评估与二元表型相关的差异表达。继戈曼和比尔曼最近的综述之后,我们比较了三种方法的统计性能,即全局检验、协方差分析全局检验和SAM-GS,它们通过受试者抽样来检验“自包含零假设”。基于模拟实验和对三个真实世界微阵列数据集的分析对这三种方法进行了比较。

结果

在模拟实验中,我们发现两种全局检验中渐近分布的使用导致了大小不正确的统计检验。具体而言,通过全局检验的缩放卡方分布和协方差分析全局检验的渐近分布计算的p值过于宽松,而具有二次形式的全局检验的渐近分布导致p值过于保守。然而,两种基于置换推断的全局检验给出了正确的大小。虽然在对基因表达数据进行适当标准化后,这三种方法在使用置换推断时显示出相似的功效,但SAM-GS显示出比全局检验略高的功效。在对一个真实世界微阵列数据集的分析中,与SAM-GS相比,两种全局检验在识别癌细胞系中基因表达与p53突变相关的途径时给出了明显不同的结果。为了产生生物学上合理的结果,两种全局检验需要对基因表达方差进行适当标准化。标准化后,这三种方法给出了非常相似的生物学上合理的结果,SAM-GS给出的统计显著性略高。在对其他两个微阵列数据集的分析中,这三种方法给出了相似的结果模式。

结论

在使用基于置换推断的情况下,适当的标准化使所有三种方法的性能相似。SAM-GS在较低的α水平区域(即最受关注的基因集)往往具有略高的功效。全局检验和协方差分析全局检验具有能够分析连续和生存表型并调整协变量的重要优势。可从http://www.ualberta.ca/~yyasui/homepage.html获得执行SAM-GS的免费Microsoft Excel插件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0688/2238724/a34566ffe1d7/1471-2105-8-431-1.jpg

相似文献

1
Comparative evaluation of gene-set analysis methods.
BMC Bioinformatics. 2007 Nov 7;8:431. doi: 10.1186/1471-2105-8-431.
2
Improving gene set analysis of microarray data by SAM-GS.
BMC Bioinformatics. 2007 Jul 5;8:242. doi: 10.1186/1471-2105-8-242.
3
Gene-set analysis and reduction.
Brief Bioinform. 2009 Jan;10(1):24-34. doi: 10.1093/bib/bbn042. Epub 2008 Oct 4.
4
Linear combination test for gene set analysis of a continuous phenotype.
BMC Bioinformatics. 2013 Jul 1;14:212. doi: 10.1186/1471-2105-14-212.
5
Gene set enrichment analysis for multiple continuous phenotypes.
BMC Bioinformatics. 2014 Aug 3;15(1):260. doi: 10.1186/1471-2105-15-260.
7
A SATS algorithm for jointly identifying multiple differentially expressed gene sets.
Stat Med. 2011 Jul 20;30(16):2028-39. doi: 10.1002/sim.4235. Epub 2011 Apr 7.
9
Sample size calculation for multiple testing in microarray data analysis.
Biostatistics. 2005 Jan;6(1):157-69. doi: 10.1093/biostatistics/kxh026.
10
Estimation of false discovery rate using sequential permutation p-values.
Biometrics. 2013 Mar;69(1):1-7. doi: 10.1111/j.1541-0420.2012.01825.x. Epub 2013 Feb 4.

引用本文的文献

1
Simulated metabolic profiles reveal biases in pathway analysis methods.
Metabolomics. 2025 Sep 9;21(5):136. doi: 10.1007/s11306-025-02335-y.
2
Comparative analysis of single-cell pathway scoring methods and a novel approach.
NAR Genom Bioinform. 2024 Sep 24;6(3):lqae124. doi: 10.1093/nargab/lqae124. eCollection 2024 Sep.
3
Dissecting Pathway Disturbances Using Network Topology and Multi-platform Genomics Data.
Stat Biosci. 2018 Apr;10(1):86-106. doi: 10.1007/s12561-017-9193-0. Epub 2017 May 4.
4
Gene Set Analysis: Challenges, Opportunities, and Future Research.
Front Genet. 2020 Jun 30;11:654. doi: 10.3389/fgene.2020.00654. eCollection 2020.
5
HisCoM-PAGE: Hierarchical Structural Component Models for Pathway Analysis of Gene Expression Data.
Genes (Basel). 2019 Nov 14;10(11):931. doi: 10.3390/genes10110931.
6
Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures.
Biomed Res Int. 2019 Apr 3;2019:2497509. doi: 10.1155/2019/2497509. eCollection 2019.
7
A strategy for evaluating pathway analysis methods.
BMC Bioinformatics. 2017 Oct 13;18(1):453. doi: 10.1186/s12859-017-1866-7.
8
paraGSEA: a scalable approach for large-scale gene expression profiling.
Nucleic Acids Res. 2017 Sep 29;45(17):e155. doi: 10.1093/nar/gkx679.
10
Extracting the Strongest Signals from Omics Data: Differentially Expressed Pathways and Beyond.
Methods Mol Biol. 2017;1613:125-159. doi: 10.1007/978-1-4939-7027-8_7.

本文引用的文献

1
Improving gene set analysis of microarray data by SAM-GS.
BMC Bioinformatics. 2007 Jul 5;8:242. doi: 10.1186/1471-2105-8-242.
2
Analyzing gene expression data in terms of gene sets: methodological issues.
Bioinformatics. 2007 Apr 15;23(8):980-7. doi: 10.1093/bioinformatics/btm051. Epub 2007 Feb 15.
3
Parameter estimation for the calibration and variance stabilization of microarray data.
Stat Appl Genet Mol Biol. 2003;2:Article3. doi: 10.2202/1544-6115.1008. Epub 2003 Apr 5.
4
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30.
5
Discovering statistically significant pathways in expression profiling studies.
Proc Natl Acad Sci U S A. 2005 Sep 20;102(38):13544-9. doi: 10.1073/pnas.0506577102. Epub 2005 Sep 8.
6
Pathway level analysis of gene expression using singular value decomposition.
BMC Bioinformatics. 2005 Sep 12;6:225. doi: 10.1186/1471-2105-6-225.
8
Testing association of a pathway with survival using gene expression data.
Bioinformatics. 2005 May 1;21(9):1950-7. doi: 10.1093/bioinformatics/bti267. Epub 2005 Jan 18.
9
Significance analysis of functional categories in gene expression studies: a structured permutation approach.
Bioinformatics. 2005 May 1;21(9):1943-9. doi: 10.1093/bioinformatics/bti260. Epub 2005 Jan 12.
10
A global test for groups of genes: testing association with a clinical outcome.
Bioinformatics. 2004 Jan 1;20(1):93-9. doi: 10.1093/bioinformatics/btg382.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验