用于基因集分析的多变量方差分析测试。

Multivariate analysis of variance test for gene set analysis.

作者信息

Tsai Chen-An, Chen James J

机构信息

Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan.

出版信息

Bioinformatics. 2009 Apr 1;25(7):897-903. doi: 10.1093/bioinformatics/btp098. Epub 2009 Mar 2.

DOI:10.1093/bioinformatics/btp098

PMID:19254923

Abstract

MOTIVATION

Gene class testing (GCT) or gene set analysis (GSA) is a statistical approach to determine whether some functionally predefined sets of genes express differently under different experimental conditions. Shortcomings of the Fisher's exact test for the overrepresentation analysis are illustrated by an example. Most alternative GSA methods are developed for data collected from two experimental conditions, and most is based on a univariate gene-by-gene test statistic or assume independence among genes in the gene set. A multivariate analysis of variance (MANOVA) approach is proposed for studies with two or more experimental conditions.

RESULTS

When the number of genes in the gene set is greater than the number of samples, the sample covariance matrix is singular and ill-condition. The use of standard multivariate methods can result in biases in the analysis. The proposed MANOVA test uses a shrinkage covariance matrix estimator for the sample covariance matrix. The MANOVA test and six other GSA published methods, principal component analysis, SAM-GS, analysis of covariance, Global, GSEA and MaxMean, are evaluated using simulation. The MANOVA test appears to perform the best in terms of control of type I error and power under the models considered in the simulation. Several publicly available microarray datasets under two and three experimental conditions are analyzed for illustrations of GSA. Most methods, except for GSEA and MaxMean, generally are comparable in terms of power of identification of significant gene sets.

AVAILABILITY

A free R-code to perform MANOVA test is available at http://mail.cmu.edu.tw/~catsai/research.htm.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基因类测试（GCT）或基因集分析（GSA）是一种统计方法，用于确定某些功能上预定义的基因集在不同实验条件下是否有不同表达。通过一个例子说明了用于过度代表性分析的Fisher精确检验的缺点。大多数替代GSA方法是为从两个实验条件收集的数据开发的，并且大多数基于单变量逐个基因的检验统计量或假设基因集中基因之间的独立性。本文提出了一种用于两个或更多实验条件研究的多变量方差分析（MANOVA）方法。

结果

当基因集中的基因数量大于样本数量时，样本协方差矩阵是奇异且病态的。使用标准多变量方法可能会导致分析出现偏差。所提出的MANOVA检验使用样本协方差矩阵的收缩协方差矩阵估计器。使用模拟对MANOVA检验和其他六种已发表的GSA方法（主成分分析、SAM-GS、协方差分析、Global、GSEA和MaxMean）进行了评估。在模拟考虑的模型下，MANOVA检验在控制I型错误和功效方面似乎表现最佳。分析了两个和三个实验条件下的几个公开可用的微阵列数据集，以说明GSA。除GSEA和MaxMean外，大多数方法在识别显著基因集的功效方面通常具有可比性。

可用性

可在http://mail.cmu.edu.tw/~catsai/research.htm获得执行MANOVA检验的免费R代码。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

Multivariate analysis of variance test for gene set analysis.用于基因集分析的多变量方差分析测试。

Bioinformatics. 2009 Apr 1;25(7):897-903. doi: 10.1093/bioinformatics/btp098. Epub 2009 Mar 2.

Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA.通过方差分析-稀疏成分分析在时间进程微阵列实验中发现基因表达模式。

Bioinformatics. 2007 Jul 15;23(14):1792-800. doi: 10.1093/bioinformatics/btm251. Epub 2007 May 22.

Significance analysis of groups of genes in expression profiling studies.表达谱研究中基因分组的显著性分析。

Bioinformatics. 2007 Aug 15;23(16):2104-12. doi: 10.1093/bioinformatics/btm310. Epub 2007 Jun 6.

Robustified MANOVA with applications in detecting differentially expressed genes from oligonucleotide arrays.稳健多变量方差分析及其在从寡核苷酸阵列中检测差异表达基因方面的应用

Bioinformatics. 2008 Apr 15;24(8):1056-62. doi: 10.1093/bioinformatics/btn053. Epub 2008 Mar 3.

SEGS: search for enriched gene sets in microarray data.SEGS：在微阵列数据中搜索富集的基因集。

J Biomed Inform. 2008 Aug;41(4):588-601. doi: 10.1016/j.jbi.2007.12.001. Epub 2007 Dec 15.

Extensions to gene set enrichment.基因集富集的扩展

Bioinformatics. 2007 Feb 1;23(3):306-13. doi: 10.1093/bioinformatics/btl599. Epub 2006 Nov 24.

Improved statistical tests for differential gene expression by shrinking variance components estimates.通过收缩方差分量估计改进差异基因表达的统计检验。

Biostatistics. 2005 Jan;6(1):59-75. doi: 10.1093/biostatistics/kxh018.

Genetic test bed for feature selection.用于特征选择的基因测试平台。

Bioinformatics. 2006 Apr 1;22(7):837-42. doi: 10.1093/bioinformatics/btl008. Epub 2006 Jan 20.

A hidden Markov model-based approach for identifying timing differences in gene expression under different experimental factors.一种基于隐马尔可夫模型的方法，用于识别不同实验因素下基因表达的时间差异。

Bioinformatics. 2007 Apr 1;23(7):842-9. doi: 10.1093/bioinformatics/btl667. Epub 2007 Jan 19.

Exploiting sample variability to enhance multivariate analysis of microarray data.利用样本变异性增强微阵列数据的多变量分析。

Bioinformatics. 2007 Oct 15;23(20):2733-40. doi: 10.1093/bioinformatics/btm441. Epub 2007 Sep 7.

引用本文的文献

Geographically weighted linear combination test for gene-set analysis of a continuous spatial phenotype as applied to intratumor heterogeneity.应用于肿瘤内异质性的连续空间表型基因集分析的地理加权线性组合检验

Front Cell Dev Biol. 2023 Mar 9;11:1065586. doi: 10.3389/fcell.2023.1065586. eCollection 2023.

Local senolysis in aged mice only partially replicates the benefits of systemic senolysis.衰老小鼠的局部衰老细胞清除仅部分复制了系统性衰老细胞清除的益处。

J Clin Invest. 2023 Apr 17;133(8):e162519. doi: 10.1172/JCI162519.

A comprehensive survey of the approaches for pathway analysis using multi-omics data integration.多组学数据整合的通路分析方法的全面综述。

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac435.

Skeletal Effects of Inducible ERα Deletion in Osteocytes in Adult Mice.成体小鼠成骨细胞中诱导型 ERα 缺失对骨骼的影响。

J Bone Miner Res. 2022 Sep;37(9):1750-1760. doi: 10.1002/jbmr.4644. Epub 2022 Jul 22.

mitch: multi-contrast pathway enrichment for multi-omics and single-cell profiling data.米奇：多组学和单细胞分析数据的多对照通路富集分析。

BMC Genomics. 2020 Jun 29;21(1):447. doi: 10.1186/s12864-020-06856-9.

The Gastric Ganglion of : Preliminary Characterization of Gene- and Putative Neurochemical-Complexity, and the Effect of Digestive Tract Infection on Gene Expression.关于胃神经节：基因及假定神经化学复杂性的初步特征，以及消化道感染对基因表达的影响

Front Physiol. 2017 Dec 15;8:1001. doi: 10.3389/fphys.2017.01001. eCollection 2017.

A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.一种基于知识的T2统计量，用于对定量蛋白质组学数据进行通路分析。

PLoS Comput Biol. 2017 Jun 16;13(6):e1005601. doi: 10.1371/journal.pcbi.1005601. eCollection 2017 Jun.

Identification of Genes Discriminating Multiple Sclerosis Patients from Controls by Adapting a Pathway Analysis Method.通过改进通路分析方法鉴别区分多发性硬化症患者与对照的基因

PLoS One. 2016 Nov 15;11(11):e0165543. doi: 10.1371/journal.pone.0165543. eCollection 2016.

Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression.基于普通最小二乘法（OLS）和线性混合模型的表型对基因表达影响推断的蒙特卡罗模拟

PeerJ. 2016 Oct 11;4:e2575. doi: 10.7717/peerj.2575. eCollection 2016.

Gene set analysis using sufficient dimension reduction.使用充分降维的基因集分析。

BMC Bioinformatics. 2016 Feb 6;17:74. doi: 10.1186/s12859-016-0928-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于基因集分析的多变量方差分析测试。

Multivariate analysis of variance test for gene set analysis.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献