Suppr超能文献

用于基因集分析的多变量方差分析测试。

Multivariate analysis of variance test for gene set analysis.

作者信息

Tsai Chen-An, Chen James J

机构信息

Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan.

出版信息

Bioinformatics. 2009 Apr 1;25(7):897-903. doi: 10.1093/bioinformatics/btp098. Epub 2009 Mar 2.

Abstract

MOTIVATION

Gene class testing (GCT) or gene set analysis (GSA) is a statistical approach to determine whether some functionally predefined sets of genes express differently under different experimental conditions. Shortcomings of the Fisher's exact test for the overrepresentation analysis are illustrated by an example. Most alternative GSA methods are developed for data collected from two experimental conditions, and most is based on a univariate gene-by-gene test statistic or assume independence among genes in the gene set. A multivariate analysis of variance (MANOVA) approach is proposed for studies with two or more experimental conditions.

RESULTS

When the number of genes in the gene set is greater than the number of samples, the sample covariance matrix is singular and ill-condition. The use of standard multivariate methods can result in biases in the analysis. The proposed MANOVA test uses a shrinkage covariance matrix estimator for the sample covariance matrix. The MANOVA test and six other GSA published methods, principal component analysis, SAM-GS, analysis of covariance, Global, GSEA and MaxMean, are evaluated using simulation. The MANOVA test appears to perform the best in terms of control of type I error and power under the models considered in the simulation. Several publicly available microarray datasets under two and three experimental conditions are analyzed for illustrations of GSA. Most methods, except for GSEA and MaxMean, generally are comparable in terms of power of identification of significant gene sets.

AVAILABILITY

A free R-code to perform MANOVA test is available at http://mail.cmu.edu.tw/~catsai/research.htm.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基因类测试(GCT)或基因集分析(GSA)是一种统计方法,用于确定某些功能上预定义的基因集在不同实验条件下是否有不同表达。通过一个例子说明了用于过度代表性分析的Fisher精确检验的缺点。大多数替代GSA方法是为从两个实验条件收集的数据开发的,并且大多数基于单变量逐个基因的检验统计量或假设基因集中基因之间的独立性。本文提出了一种用于两个或更多实验条件研究的多变量方差分析(MANOVA)方法。

结果

当基因集中的基因数量大于样本数量时,样本协方差矩阵是奇异且病态的。使用标准多变量方法可能会导致分析出现偏差。所提出的MANOVA检验使用样本协方差矩阵的收缩协方差矩阵估计器。使用模拟对MANOVA检验和其他六种已发表的GSA方法(主成分分析、SAM-GS、协方差分析、Global、GSEA和MaxMean)进行了评估。在模拟考虑的模型下,MANOVA检验在控制I型错误和功效方面似乎表现最佳。分析了两个和三个实验条件下的几个公开可用的微阵列数据集,以说明GSA。除GSEA和MaxMean外,大多数方法在识别显著基因集的功效方面通常具有可比性。

可用性

可在http://mail.cmu.edu.tw/~catsai/research.htm获得执行MANOVA检验的免费R代码。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验