• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于矩的基因集检验。

Moment based gene set tests.

作者信息

Larson Jessica L, Owen Art B

机构信息

Department of Bioinformatics and Computational Biology, Genentech, Inc., South San Francisco, USA.

Currently at GenePeeks, Inc., Cambridge, USA.

出版信息

BMC Bioinformatics. 2015 Apr 28;16:132. doi: 10.1186/s12859-015-0571-7.

DOI:10.1186/s12859-015-0571-7
PMID:25928861
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4419444/
Abstract

BACKGROUND

Permutation-based gene set tests are standard approaches for testing relationships between collections of related genes and an outcome of interest in high throughput expression analyses. Using M random permutations, one can attain p-values as small as 1/(M+1). When many gene sets are tested, we need smaller p-values, hence larger M, to achieve significance while accounting for the number of simultaneous tests being made. As a result, the number of permutations to be done rises along with the cost per permutation. To reduce this cost, we seek parametric approximations to the permutation distributions for gene set tests.

RESULTS

We study two gene set methods based on sums and sums of squared correlations. The statistics we study are among the best performers in the extensive simulation of 261 gene set methods by Ackermann and Strimmer in 2009. Our approach calculates exact relevant moments of these statistics and uses them to fit parametric distributions. The computational cost of our algorithm for the linear case is on the order of doing |G| permutations, where |G| is the number of genes in set G. For the quadratic statistics, the cost is on the order of |G|(2) permutations which can still be orders of magnitude faster than plain permutation sampling. We applied the permutation approximation method to three public Parkinson's Disease expression datasets and discovered enriched gene sets not previously discussed. We found that the moment-based gene set enrichment p-values closely approximate the permutation method p-values at a tiny fraction of their cost. They also gave nearly identical rankings to the gene sets being compared.

CONCLUSIONS

We have developed a moment based approximation to linear and quadratic gene set test statistics' permutation distribution. This allows approximate testing to be done orders of magnitude faster than one could do by sampling permutations. We have implemented our method as a publicly available Bioconductor package, npGSEA (www.bioconductor.org) .

摘要

背景

基于排列的基因集测试是在高通量表达分析中测试相关基因集合与感兴趣结果之间关系的标准方法。通过M次随机排列,可以得到低至1/(M + 1)的p值。当测试多个基因集时,我们需要更小的p值,因此需要更大的M来达到显著性,同时要考虑到进行的同步测试数量。结果,所需的排列次数会随着每次排列的成本而增加。为了降低成本,我们寻求基因集测试排列分布的参数近似值。

结果

我们研究了基于总和以及平方相关和的两种基因集方法。我们所研究的统计量是2009年阿克曼和施特里默对261种基因集方法进行广泛模拟时表现最佳的统计量之一。我们的方法计算这些统计量的精确相关矩,并使用它们来拟合参数分布。对于线性情况,我们算法的计算成本约为进行|G|次排列,其中|G|是集合G中的基因数量。对于二次统计量,成本约为|G|(2)次排列,这仍然比普通排列抽样快几个数量级。我们将排列近似方法应用于三个公开的帕金森病表达数据集,并发现了以前未讨论过的富集基因集。我们发现基于矩的基因集富集p值以极低的成本紧密近似排列方法的p值。它们对所比较的基因集也给出了几乎相同的排名。

结论

我们已经开发出一种基于矩的近似方法,用于线性和二次基因集测试统计量的排列分布。这使得近似测试比通过抽样排列进行测试的速度快几个数量级。我们已将我们的方法实现为一个公开可用的生物导体包npGSEA(www.bioconductor.org)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13d2/4419444/d03524d9e053/12859_2015_571_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13d2/4419444/d0ecc1b48acc/12859_2015_571_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13d2/4419444/d03524d9e053/12859_2015_571_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13d2/4419444/d0ecc1b48acc/12859_2015_571_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13d2/4419444/d03524d9e053/12859_2015_571_Fig2_HTML.jpg

相似文献

1
Moment based gene set tests.基于矩的基因集检验。
BMC Bioinformatics. 2015 Apr 28;16:132. doi: 10.1186/s12859-015-0571-7.
2
Fast approximation of small p-values in permutation tests by partitioning the permutations.通过对排列进行划分来快速近似排列检验中的小p值。
Biometrics. 2018 Mar;74(1):196-206. doi: 10.1111/biom.12731. Epub 2017 May 18.
3
Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.排列P值永远不应为零:当排列是随机抽取时计算精确P值。
Stat Appl Genet Mol Biol. 2010;9:Article39. doi: 10.2202/1544-6115.1585. Epub 2010 Oct 31.
4
Faster permutation inference in brain imaging.脑成像中更快的排列推断
Neuroimage. 2016 Nov 1;141:502-516. doi: 10.1016/j.neuroimage.2016.05.068. Epub 2016 Jun 7.
5
Accurate and fast small -value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method.利用交叉熵方法对高通量基因组数据分析中的置换检验进行准确快速的小值估计。
Stat Appl Genet Mol Biol. 2023 Aug 25;22(1). doi: 10.1515/sagmb-2021-0067. eCollection 2023 Jan 1.
6
Conservative adjustment of permutation p-values when the number of permutations is limited.当排列数有限时排列p值的保守调整
Int J Bioinform Res Appl. 2007;3(4):536-46. doi: 10.1504/IJBRA.2007.015420.
7
PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing.PERMORY:一种利用 LD 进行置换检验的算法,用于进行强大的全基因组关联测试。
Bioinformatics. 2010 Sep 1;26(17):2093-100. doi: 10.1093/bioinformatics/btq399. Epub 2010 Jul 6.
8
Gene set analysis: limitations in popular existing methods and proposed improvements.基因集分析:现有常用方法的局限性及提出的改进措施
Bioinformatics. 2014 Oct;30(19):2747-56. doi: 10.1093/bioinformatics/btu374. Epub 2014 Jun 5.
9
Fewer permutations, more accurate P-values.排列组合更少,P值更准确。
Bioinformatics. 2009 Jun 15;25(12):i161-8. doi: 10.1093/bioinformatics/btp211.
10
Permutations of functional magnetic resonance imaging classification may not be normally distributed.功能磁共振成像分类的排列可能不呈正态分布。
Stat Methods Med Res. 2017 Dec;26(6):2567-2585. doi: 10.1177/0962280215601707.

引用本文的文献

1
Roastgsa: a comparison of rotation-based scores for gene set enrichment analysis.Roastgsa:基于旋转的基因集富集分析评分比较。
BMC Bioinformatics. 2023 Oct 30;24(1):408. doi: 10.1186/s12859-023-05510-x.
2
SEMgsa: topology-based pathway enrichment analysis with structural equation models.SEMgsa:基于拓扑结构的路径富集分析与结构方程模型。
BMC Bioinformatics. 2022 Aug 17;23(1):344. doi: 10.1186/s12859-022-04884-8.
3
Patient-derived xenografts undergo mouse-specific tumor evolution.患者来源的异种移植瘤经历小鼠特异性肿瘤进化。

本文引用的文献

1
PINK1 regulates histone H3 trimethylation and gene expression by interaction with the polycomb protein EED/WAIT1.PINK1 通过与多梳蛋白 EED/WAIT1 相互作用调节组蛋白 H3 三甲基化和基因表达。
Proc Natl Acad Sci U S A. 2013 Sep 3;110(36):14729-34. doi: 10.1073/pnas.1216844110. Epub 2013 Aug 19.
2
Efficient Moments-based Permutation Tests.基于有效矩的排列检验
Adv Neural Inf Process Syst. 2009;22:2277-2285.
3
Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.排列P值永远不应为零:当排列是随机抽取时计算精确P值。
Nat Genet. 2017 Nov;49(11):1567-1575. doi: 10.1038/ng.3967. Epub 2017 Oct 9.
4
Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection.重叠组逻辑回归及其在遗传通路选择中的应用
Cancer Inform. 2016 Sep 15;15:179-87. doi: 10.4137/CIN.S40043. eCollection 2016.
5
Bioconductor's EnrichmentBrowser: seamless navigation through combined results of set- & network-based enrichment analysis.生物导体的富集浏览器:通过基于集合和网络的富集分析的综合结果进行无缝导航。
BMC Bioinformatics. 2016 Jan 20;17:45. doi: 10.1186/s12859-016-0884-1.
Stat Appl Genet Mol Biol. 2010;9:Article39. doi: 10.2202/1544-6115.1585. Epub 2010 Oct 31.
4
ROAST: rotation gene set tests for complex microarray experiments.ROAST:用于复杂微阵列实验的旋转基因集检验。
Bioinformatics. 2010 Sep 1;26(17):2176-82. doi: 10.1093/bioinformatics/btq401. Epub 2010 Jul 7.
5
Fewer permutations, more accurate P-values.排列组合更少,P值更准确。
Bioinformatics. 2009 Jun 15;25(12):i161-8. doi: 10.1093/bioinformatics/btp211.
6
Serotonin and Parkinson's disease: On movement, mood, and madness.血清素与帕金森病:关于运动、情绪及精神错乱
Mov Disord. 2009 Jul 15;24(9):1255-66. doi: 10.1002/mds.22473.
7
Innate and adaptive immunity for the pathobiology of Parkinson's disease.固有免疫和适应性免疫与帕金森病的病理生物学。
Antioxid Redox Signal. 2009 Sep;11(9):2151-66. doi: 10.1089/ars.2009.2460.
8
A general modular framework for gene set enrichment analysis.一种用于基因集富集分析的通用模块化框架。
BMC Bioinformatics. 2009 Feb 3;10:47. doi: 10.1186/1471-2105-10-47.
9
Analyzing gene expression data in terms of gene sets: methodological issues.从基因集角度分析基因表达数据:方法学问题。
Bioinformatics. 2007 Apr 15;23(8):980-7. doi: 10.1093/bioinformatics/btm051. Epub 2007 Feb 15.
10
Molecular markers of early Parkinson's disease based on gene expression in blood.基于血液中基因表达的早期帕金森病分子标志物
Proc Natl Acad Sci U S A. 2007 Jan 16;104(3):955-60. doi: 10.1073/pnas.0610204104. Epub 2007 Jan 10.