Perinatology Research Branch, NICHD/NIH/DHHS, Bethesda, Maryland, USA.
BMC Bioinformatics. 2012 Jun 19;13:136. doi: 10.1186/1471-2105-13-136.
The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set.
In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results.
PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org.
根据微阵列数据识别给定条件下受显著影响的基因集是当前生命科学研究的关键步骤。大多数基因集分析方法平等对待基因,而不管它们对特定基因集的特异性如何。
在这项工作中,我们提出了一种新的基因集分析方法,该方法将基因集得分计算为加权调节基因 t 分数的绝对值的平均值。基因权重旨在强调出现在少数基因集中的基因,而不是出现在许多基因集中的基因。我们通过分析对应于 KEGG 途径的基因集来证明该方法的有效性,因此我们将我们的方法称为具有重叠基因降权的途径分析(Pathway Analysis with Down-weighting of Overlapping Genes,PADOG)。与大多数基因集分析方法通过分析 2-3 个数据集并对结果进行人工解释来验证不同,这里的验证使用了 24 个不同的数据集和一个完全客观的评估方案,该方案进行了最小的假设,并且无需对分析结果进行可能有偏差的人工评估。
PADOG 显著提高了基因集的排序,并通过利用基因表达谱和要分析的基因集集合中已经存在的信息来提高分析的敏感性。与其他现有方法相比,PADOG 的优势在分析的基因集数据库发生变化时仍然稳定。PADOG 已作为一个 R 包实现,可在以下网址获得:http://bioinformaticsprb.med.wayne.edu/PADOG/或 http://www.bioconductor.org。