College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.
J Hum Genet. 2012 Oct;57(10):642-53. doi: 10.1038/jhg.2012.86. Epub 2012 Jul 12.
When compared with single gene functional analysis, gene set analysis (GSA) can extract more information from gene expression profiles. Currently, several gene set methods have been proposed, but most of the methods cannot detect gene sets with a large number of minor-effect genes. Here, we propose a novel distance-based gene set analysis method. The distance between two groups of genes with different phenotypes based on gene expression should be larger if a certain gene set is significantly associated with the given phenotype. We calculated the distance between two groups with different phenotypes, estimated the significant P-values using two permutation methods and performed multiple hypothesis testing adjustments. This method was performed on one simulated data set and three real data sets. After a comparison and literature verification, we determined that the gene resampling-based permutation method is more suitable for GSA, and the centroid statistical and average linkage statistical distance methods are efficient, especially in detecting gene sets containing more minor-effect genes. We believe that this distance-based method will assist us in finding functional gene sets that are significantly related to a complex trait. Additionally, we have prepared a simple and publically available Perl and R package (http://bioinfo.hrbmu.edu.cn/dbgsa or http://cran.r-project.org/web/packages/DBGSA/).
与单个基因功能分析相比,基因集分析(GSA)可以从基因表达谱中提取更多信息。目前已经提出了几种基因集方法,但大多数方法都无法检测到具有大量微效基因的基因集。在这里,我们提出了一种新的基于距离的基因集分析方法。如果某个基因集与给定的表型显著相关,那么基于基因表达的两种不同表型的两组基因之间的距离应该更大。我们计算了两组具有不同表型的基因之间的距离,使用两种置换方法估计了显著的 P 值,并进行了多次假设检验调整。该方法在一个模拟数据集和三个真实数据集上进行了评估。经过比较和文献验证,我们确定基于基因重采样的置换方法更适合 GSA,而质心统计和平均链接统计距离方法是有效的,特别是在检测包含更多微效基因的基因集时。我们相信,这种基于距离的方法将有助于我们找到与复杂性状显著相关的功能基因集。此外,我们还准备了一个简单且公开的 Perl 和 R 包(http://bioinfo.hrbmu.edu.cn/dbgsa 或 http://cran.r-project.org/web/packages/DBGSA/)。