Suppr超能文献

非加权基因集富集分析的显著性分数计算。

Computation of significance scores of unweighted Gene Set Enrichment Analyses.

作者信息

Keller Andreas, Backes Christina, Lenhof Hans-Peter

机构信息

Center for Bioinformatics, Saarland University, Building E1 1, 66804 Saarbrücken, Germany.

出版信息

BMC Bioinformatics. 2007 Aug 6;8:290. doi: 10.1186/1471-2105-8-290.

Abstract

BACKGROUND

Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values.

RESULTS

We present a novel dynamic programming algorithm for calculating exact significance values of unweighted Gene Set Enrichment Analyses. Our algorithm avoids typical problems of nonparametric permutation tests, as varying findings in different runs caused by the random sampling procedure. Another advantage of the presented dynamic programming algorithm is its runtime and memory efficiency. To test our algorithm, we applied it not only to simulated data sets, but additionally evaluated expression profiles of squamous cell lung cancer tissue and autologous unaffected tissue.

摘要

背景

基因集富集分析(GSEA)是一种用于对基因或蛋白质排序列表进行统计评估的计算方法。最初,GSEA是为解释微阵列基因表达数据而开发的,但它可应用于任何基因排序列表。给定基因列表和任意生物类别,GSEA评估所考虑类别的基因是随机分布还是在列表顶部或底部积累。通常,GSEA的显著性分数(p值)通过非参数置换检验计算,这是一个耗时的过程,只能产生p值的估计值。

结果

我们提出了一种新颖的动态规划算法,用于计算未加权基因集富集分析的精确显著性值。我们的算法避免了非参数置换检验的典型问题,如随机抽样过程导致不同运行结果不同。所提出的动态规划算法的另一个优点是其运行时和内存效率。为了测试我们的算法,我们不仅将其应用于模拟数据集,还评估了鳞状细胞肺癌组织和自体未受影响组织的表达谱。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验