多组大规模两样本表达数据集的一致整合基因集富集分析。

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

出版信息

BMC Genomics. 2014;15 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2164-15-S1-S6. Epub 2014 Jan 24.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4046697/

Abstract

BACKGROUND

Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment.

METHODS

We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets.

RESULTS

We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method.

CONCLUSIONS

This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.

摘要

背景

基因集富集分析（GSEA）是一种分析途径水平上协调表达变化的重要方法。尽管已经提出了许多统计和计算方法用于 GSEA，但对于多个表达数据集的一致综合 GSEA 问题尚未得到很好的解决。在为相同或相似的研究目的收集的不同相关数据集中，识别具有一致富集的途径或基因集非常重要。

方法

我们将差异表达的潜在真实状态分为三个代表性类别：无变化、正变化和负变化。由于数据噪声，我们从实验中观察到的可能并不表示潜在的真实情况。尽管这些类别在实践中未被观察到，但它们可以在混合模型框架中进行考虑。然后，我们定义了一致基因集富集的数学概念，并基于三成分多元正态混合模型计算其相关概率。可以计算相关的假发现率并用于对不同基因集进行排名。

结果

我们使用三个已发表的肺癌微阵列基因表达数据集来说明我们提出的方法。基于前两个数据集进行了一项分析，以将我们的结果与之前基于单独对每个数据集进行的 GSEA 进行的发表结果进行比较。该比较说明了我们提出的一致综合基因集富集分析的优势。然后，使用相对较新和较大的途径集，我们对前两个数据集以及所有三个数据集进行了综合分析。两个结果都表明，许多基因集可以以较低的假发现率识别出来。这两个结果之间也观察到了一致性。基于 KEGG 癌症途径集的进一步探索表明，我们的方法可以识别出大多数这些途径。

结论

本研究表明，我们可以通过对多个大型两样本基因表达数据集进行一致的综合分析来提高检测能力和发现的一致性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6951/4046697/6f507c66a750/12864_2014_5679_Fig1_HTML.jpg

相似文献

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

BMC Genomics. 2014;15 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2164-15-S1-S6. Epub 2014 Jan 24.

Detecting discordance enrichment among a series of two-sample genome-wide expression data sets.

BMC Genomics. 2017 Jan 25;18(Suppl 1):1050. doi: 10.1186/s12864-016-3265-2.

An efficient concordant integrative analysis of multiple large-scale two-sample expression data sets.

Bioinformatics. 2017 Dec 1;33(23):3852-3860. doi: 10.1093/bioinformatics/btx061.

Comparative study of gene set enrichment methods.

BMC Bioinformatics. 2009 Sep 2;10:275. doi: 10.1186/1471-2105-10-275.

Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.

BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16.

PAGE: parametric analysis of gene set enrichment.

BMC Bioinformatics. 2005 Jun 8;6:144. doi: 10.1186/1471-2105-6-144.

Multiple testing for gene sets from microarray experiments.

BMC Bioinformatics. 2011 May 26;12:209. doi: 10.1186/1471-2105-12-209.

Improving gene set analysis of microarray data by SAM-GS.

BMC Bioinformatics. 2007 Jul 5;8:242. doi: 10.1186/1471-2105-8-242.

Gene set enrichment analysis for non-monotone association and multiple experimental categories.

BMC Bioinformatics. 2008 Nov 14;9:481. doi: 10.1186/1471-2105-9-481.

Improving Gene-Set Enrichment Analysis of RNA-Seq Data with Small Replicates.

PLoS One. 2016 Nov 9;11(11):e0165919. doi: 10.1371/journal.pone.0165919. eCollection 2016.

引用本文的文献

Pathway Association Studies Reveal Gene Loci and Pathway Networks that Associated With Plasma Cystatin C Levels.

Front Genet. 2021 Nov 25;12:711155. doi: 10.3389/fgene.2021.711155. eCollection 2021.

Gene set enrichment analysis and meta-analysis identified 12 key genes regulating and controlling the prognosis of lung adenocarcinoma.

Oncol Lett. 2019 Jun;17(6):5608-5618. doi: 10.3892/ol.2019.10236. Epub 2019 Apr 9.

Detecting discordance enrichment among a series of two-sample genome-wide expression data sets.

BMC Genomics. 2017 Jan 25;18(Suppl 1):1050. doi: 10.1186/s12864-016-3265-2.

An efficient concordant integrative analysis of multiple large-scale two-sample expression data sets.

Bioinformatics. 2017 Dec 1;33(23):3852-3860. doi: 10.1093/bioinformatics/btx061.

Differential correlation for sequencing data.

BMC Res Notes. 2017 Jan 19;10(1):54. doi: 10.1186/s13104-016-2331-9.

The discordant method: a novel approach for differential correlation.

Bioinformatics. 2016 Mar 1;32(5):690-6. doi: 10.1093/bioinformatics/btv633. Epub 2015 Oct 31.

Diagnostic biases in translational bioinformatics.

BMC Med Genomics. 2015 Aug 1;8:46. doi: 10.1186/s12920-015-0116-y.

Associations between DNA methylation and schizophrenia-related intermediate phenotypes - a gene set enrichment analysis.

Prog Neuropsychopharmacol Biol Psychiatry. 2015 Jun 3;59:31-39. doi: 10.1016/j.pnpbp.2015.01.006. Epub 2015 Jan 15.

Vitamin D receptor and RXR in the post-genomic era.

J Cell Physiol. 2015 Apr;230(4):758-66. doi: 10.1002/jcp.24847.

Vitamin D and the RNA transcriptome: more than mRNA regulation.

Front Physiol. 2014 May 14;5:181. doi: 10.3389/fphys.2014.00181. eCollection 2014.

本文引用的文献

Atypical B cell receptor signaling: straddling immune diseases and cancer.

Int Rev Immunol. 2013 Aug;32(4):355-7. doi: 10.3109/08830185.2013.817248. Epub 2013 Jul 12.

A powerful Bayesian meta-analysis method to integrate multiple gene set enrichment studies.

Bioinformatics. 2013 Apr 1;29(7):862-9. doi: 10.1093/bioinformatics/btt068. Epub 2013 Feb 15.

Gene set analysis methods: statistical models and methodological differences.

Brief Bioinform. 2014 Jul;15(4):504-18. doi: 10.1093/bib/bbt002.

Meta-analysis for pathway enrichment analysis when combining multiple genomic studies.

Bioinformatics. 2010 May 15;26(10):1316-23. doi: 10.1093/bioinformatics/btq148. Epub 2010 Apr 21.

Cardiovascular inflammation and lesion cell apoptosis: a novel connection via the interferon-inducible immunoproteasome.

Arterioscler Thromb Vasc Biol. 2009 Aug;29(8):1213-9. doi: 10.1161/ATVBAHA.109.189407. Epub 2009 May 14.

A statistical framework for integrating two microarray data sets in differential expression analysis.

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S23. doi: 10.1186/1471-2105-10-S1-S23.

Meta-analysis of age-related gene expression profiles identifies common signatures of aging.

Bioinformatics. 2009 Apr 1;25(7):875-81. doi: 10.1093/bioinformatics/btp073. Epub 2009 Feb 2.

Comprehensive genomic characterization defines human glioblastoma genes and core pathways.

Nature. 2008 Oct 23;455(7216):1061-8. doi: 10.1038/nature07385. Epub 2008 Sep 4.

Gene Vector Analysis (Geneva): a unified method to detect differentially-regulated gene sets and similar microarray experiments.

BMC Bioinformatics. 2008 Aug 22;9:348. doi: 10.1186/1471-2105-9-348.

Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.

Nature. 2008 Jun 26;453(7199):1239-43. doi: 10.1038/nature07002. Epub 2008 May 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多组大规模两样本表达数据集的一致整合基因集富集分析。

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献