Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA.
Bioinformatics. 2011 Jan 1;27(1):70-7. doi: 10.1093/bioinformatics/btq593. Epub 2010 Oct 22.
Functional enrichment analysis using primary genomics datasets is an emerging approach to complement established methods for functional enrichment based on predefined lists of functionally related genes. Currently used methods depend on creating lists of 'significant' and 'non-significant' genes based on ad hoc significance cutoffs. This can lead to loss of statistical power and can introduce biases affecting the interpretation of experimental results.
We developed and validated a new statistical framework, generalized random set (GRS) analysis, for comparing the genomic signatures in two datasets without the need for gene categorization. In our tests, GRS produced correct measures of statistical significance, and it showed dramatic improvement in the statistical power over other methods currently used in this setting. We also developed a procedure for identifying genes driving the concordance of the genomics profiles and demonstrated a dramatic improvement in functional coherence of genes identified in such analysis.
GRS can be downloaded as part of the R package CLEAN from http://ClusterAnalysis.org/. An online implementation is available at http://GenomicsPortals.org/.
使用原始基因组数据集进行功能富集分析是一种补充基于预定义功能相关基因列表进行功能富集的既定方法的新兴方法。目前使用的方法依赖于根据特定的显著性截止值创建“显著”和“非显著”基因的列表。这可能导致统计功效的损失,并可能引入影响实验结果解释的偏差。
我们开发并验证了一种新的统计框架,广义随机集(GRS)分析,用于比较两个数据集的基因组特征,而无需进行基因分类。在我们的测试中,GRS 产生了正确的统计显著性度量,并且与当前在此环境中使用的其他方法相比,它在统计功效方面有了显著的提高。我们还开发了一种用于识别驱动基因组特征一致性的基因的程序,并证明了在这种分析中鉴定的基因的功能一致性有了显著提高。
GRS 可以从 http://ClusterAnalysis.org/ 下载到 R 包 CLEAN 中。在线实现可在 http://GenomicsPortals.org/ 获得。