Zhou Yiyong, Cras-Méneur Corentin, Ohsugi Mitsuru, Stormo Gary D, Permutt M Alan
Division of Endocrinology, Metabolism and Lipid Research, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA.
Bioinformatics. 2007 Aug 15;23(16):2073-9. doi: 10.1093/bioinformatics/btm292. Epub 2007 Jun 5.
Currently most of the methods for identifying differentially expressed genes fall into the category of so called single-gene-analysis, performing hypothesis testing on a gene-by-gene basis. In a single-gene-analysis approach, estimating the variability of each gene is required to determine whether a gene is differentially expressed or not. Poor accuracy of variability estimation makes it difficult to identify genes with small fold-changes unless a very large number of replicate experiments are performed.
We propose a method that can avoid the difficult task of estimating variability for each gene, while reliably identifying a group of differentially expressed genes with low false discovery rates, even when the fold-changes are very small. In this article, a new characterization of differentially expressed genes is established based on a theorem about the distribution of ranks of genes sorted by (log) ratios within each array. This characterization of differentially expressed genes based on rank is an example of all-gene-analysis instead of single gene analysis. We apply the method to a cDNA microarray dataset and many low fold-changed genes (as low as 1.3 fold-changes) are reliably identified without carrying out hypothesis testing on a gene-by-gene basis. The false discovery rate is estimated in two different ways reflecting the variability from all the genes without the complications related to multiple hypothesis testing. We also provide some comparisons between our approach and single-gene-analysis based methods.
Supplementary data are available at Bioinformatics online.
目前,大多数用于识别差异表达基因的方法都属于所谓的单基因分析范畴,即逐基因进行假设检验。在单基因分析方法中,需要估计每个基因的变异性,以确定一个基因是否差异表达。变异性估计的准确性较差,使得难以识别具有小倍数变化的基因,除非进行大量的重复实验。
我们提出了一种方法,该方法可以避免估计每个基因变异性这一艰巨任务,同时即使在倍数变化非常小的情况下,也能以低错误发现率可靠地识别出一组差异表达基因。在本文中,基于一个关于每个阵列中按(对数)比率排序的基因排名分布的定理,建立了差异表达基因的新特征描述。这种基于排名的差异表达基因特征描述是全基因分析而非单基因分析的一个例子。我们将该方法应用于一个cDNA微阵列数据集,无需逐基因进行假设检验,就能可靠地识别出许多低倍数变化的基因(低至1.3倍变化)。通过两种不同方式估计错误发现率,这两种方式反映了所有基因的变异性,而不会出现与多重假设检验相关的复杂情况。我们还对我们的方法与基于单基因分析的方法进行了一些比较。
补充数据可在《生物信息学》在线获取。