Suppr超能文献

基于谱富集的基因集测试独立滤波器

An Independent Filter for Gene Set Testing Based on Spectral Enrichment.

作者信息

Frost H Robert, Li Zhigang, Asselbergs Folkert W, Moore Jason H

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2015 Sep-Oct;12(5):1076-86. doi: 10.1109/TCBB.2015.2415815.

Abstract

Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.

摘要

基因集测试已成为分析高维基因组数据不可或缺的工具。测试基因集而非单个基因组变量的一个重要动机是通过减少测试假设的数量来提高统计功效。然而,鉴于常见基因集集合的急剧增长,测试时使用的基因集数量往往与潜在的基因组变量数量几乎一样多。为应对大型基因集集合对统计功效构成的挑战,我们开发了谱基因集过滤(SGSF)方法,这是一种在基因集测试之前对基因集集合进行独立过滤的新技术。SGSF方法使用衡量每个基因集与样本主成分(PC)之间关联统计显著性的p值作为过滤统计量,并考虑相关特征值的显著性。由于在原假设下此过滤统计量与标准基因集测试统计量无关,但在备择假设下相关,因此在不影响I型错误率的情况下,富集基因集的比例会增加。如使用模拟和真实基因表达数据所示,SGSF算法能准确过滤与实验结果无关的基因集,从而显著提高基因集测试功效。

相似文献

1
An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
IEEE/ACM Trans Comput Biol Bioinform. 2015 Sep-Oct;12(5):1076-86. doi: 10.1109/TCBB.2015.2415815.
2
Spectral gene set enrichment (SGSE).
BMC Bioinformatics. 2015 Mar 3;16:70. doi: 10.1186/s12859-015-0490-7.
3
A blocking strategy to improve gene selection for classification of gene expression data.
IEEE/ACM Trans Comput Biol Bioinform. 2007 Apr-Jun;4(2):293-300. doi: 10.1109/TCBB.2007.1014.
4
Clustering by soft-constraint affinity propagation: applications to gene-expression data.
Bioinformatics. 2007 Oct 15;23(20):2708-15. doi: 10.1093/bioinformatics/btm414. Epub 2007 Sep 25.
5
Construction of a reference gene association network from multiple profiling data: application to data analysis.
Bioinformatics. 2007 Oct 15;23(20):2716-24. doi: 10.1093/bioinformatics/btm423. Epub 2007 Sep 10.
6
Improving protein protein interaction prediction based on phylogenetic information using a least-squares support vector machine.
Ann N Y Acad Sci. 2007 Dec;1115:154-67. doi: 10.1196/annals.1407.005. Epub 2007 Oct 9.
7
Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL.
Brief Bioinform. 2007 Mar;8(2):71-7. doi: 10.1093/bib/bbl019. Epub 2006 Oct 31.
8
Supervised inference of gene-regulatory networks.
BMC Bioinformatics. 2008 Jan 4;9:2. doi: 10.1186/1471-2105-9-2.
9
Fitting a geometric graph to a protein-protein interaction network.
Bioinformatics. 2008 Apr 15;24(8):1093-9. doi: 10.1093/bioinformatics/btn079. Epub 2008 Mar 14.
10
Essential latent knowledge for protein-protein interactions: analysis by an unsupervised learning approach.
IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):119-30. doi: 10.1109/TCBB.2005.23.

引用本文的文献

1
Riemannian Variance Filtering: An Independent Filtering Scheme for Statistical Tests on Manifold-valued Data.
Conf Comput Vis Pattern Recognit Workshops. 2017 Jul;2017:699-708. doi: 10.1109/CVPRW.2017.99. Epub 2017 Aug 24.
2
Computation and application of tissue-specific gene set weights.
Bioinformatics. 2018 Sep 1;34(17):2957-2964. doi: 10.1093/bioinformatics/bty217.
3
Unsupervised gene set testing based on random matrix theory.
BMC Bioinformatics. 2016 Nov 4;17(1):442. doi: 10.1186/s12859-016-1299-8.

本文引用的文献

1
Principal component gene set enrichment (PCGSE).
BioData Min. 2015 Aug 19;8:25. doi: 10.1186/s13040-015-0059-z. eCollection 2015.
2
Spectral gene set enrichment (SGSE).
BMC Bioinformatics. 2015 Mar 3;16:70. doi: 10.1186/s12859-015-0490-7.
3
α5β1 integrin signaling mediates oxidized low-density lipoprotein-induced inflammation and early atherosclerosis.
Arterioscler Thromb Vasc Biol. 2014 Jul;34(7):1362-73. doi: 10.1161/ATVBAHA.114.303863. Epub 2014 May 15.
4
How to get the most from microarray data: advice from reverse genomics.
BMC Genomics. 2014 Mar 21;15:223. doi: 10.1186/1471-2164-15-223.
5
Optimization of gene set annotations via entropy minimization over variable clusters (EMVC).
Bioinformatics. 2014 Jun 15;30(12):1698-706. doi: 10.1093/bioinformatics/btu110. Epub 2014 Feb 25.
6
Molecular biology of atherosclerosis.
Physiol Rev. 2013 Jul;93(3):1317-542. doi: 10.1152/physrev.00004.2012.
7
Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction.
Biometrika. 2012 Dec;99(4):929-944. doi: 10.1093/biomet/ass044. Epub 2012 Sep 25.
9
NCBI GEO: archive for functional genomics data sets--update.
Nucleic Acids Res. 2013 Jan;41(Database issue):D991-5. doi: 10.1093/nar/gks1193. Epub 2012 Nov 27.
10
Camera: a competitive gene set test accounting for inter-gene correlation.
Nucleic Acids Res. 2012 Sep 1;40(17):e133. doi: 10.1093/nar/gks461. Epub 2012 May 25.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验