Suppr超能文献

一种用于跨研究差异基因表达的贝叶斯模型。

A Bayesian model for cross-study differential gene expression.

作者信息

Scharpf Robert B, Tjelmeland Håkon, Parmigiani Giovanni, Nobel Andrew B

机构信息

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205.

出版信息

J Am Stat Assoc. 2009;104(488):1295-1310. doi: 10.1198/jasa.2009.ap07611.

Abstract

In this paper we define a hierarchical Bayesian model for microarray expression data collected from several studies and use it to identify genes that show differential expression between two conditions. Key features include shrinkage across both genes and studies, and flexible modeling that allows for interactions between platforms and the estimated effect, as well as concordant and discordant differential expression across studies. We evaluated the performance of our model in a comprehensive fashion, using both artificial data, and a "split-study" validation approach that provides an agnostic assessment of the model's behavior not only under the null hypothesis, but also under a realistic alternative. The simulation results from the artificial data demonstrate the advantages of the Bayesian model. The 1 - AUC values for the Bayesian model are roughly half of the corresponding values for a direct combination of t- and SAM-statistics. Furthermore, the simulations provide guidelines for when the Bayesian model is most likely to be useful. Most noticeably, in small studies the Bayesian model generally outperforms other methods when evaluated by AUC, FDR, and MDR across a range of simulation parameters, and this difference diminishes for larger sample sizes in the individual studies. The split-study validation illustrates appropriate shrinkage of the Bayesian model in the absence of platform-, sample-, and annotation-differences that otherwise complicate experimental data analyses. Finally, we fit our model to four breast cancer studies employing different technologies (cDNA and Affymetrix) to estimate differential expression in estrogen receptor positive tumors versus negative ones. Software and data for reproducing our analysis are publicly available.

摘要

在本文中,我们为从多项研究中收集的微阵列表达数据定义了一种分层贝叶斯模型,并使用该模型来识别在两种条件下显示差异表达的基因。关键特征包括跨基因和跨研究的收缩,以及灵活的建模,该建模允许平台与估计效应之间的相互作用,以及跨研究的一致和不一致的差异表达。我们使用人工数据以及一种“拆分研究”验证方法,以全面的方式评估了我们模型的性能,该验证方法不仅在原假设下,而且在现实的备择假设下,都能对模型的行为进行无偏评估。来自人工数据的模拟结果证明了贝叶斯模型的优势。贝叶斯模型的1 - AUC值大约是t统计量和SAM统计量直接组合的相应值的一半。此外,模拟为贝叶斯模型最可能有用的情况提供了指导方针。最值得注意的是,在小型研究中,当在一系列模拟参数下通过AUC、FDR和MDR进行评估时,贝叶斯模型通常优于其他方法,并且对于单个研究中较大的样本量,这种差异会减小。拆分研究验证说明了在不存在否则会使实验数据分析复杂化的平台、样本和注释差异的情况下,贝叶斯模型的适当收缩。最后,我们将我们的模型应用于四项采用不同技术(cDNA和Affymetrix)的乳腺癌研究,以估计雌激素受体阳性肿瘤与阴性肿瘤之间的差异表达。用于重现我们分析的软件和数据可公开获取。

相似文献

1
A Bayesian model for cross-study differential gene expression.
J Am Stat Assoc. 2009;104(488):1295-1310. doi: 10.1198/jasa.2009.ap07611.
2
Bayesian models for pooling microarray studies with multiple sources of replications.
BMC Bioinformatics. 2006 May 5;7:247. doi: 10.1186/1471-2105-7-247.
5
6
Multivariate hierarchical Bayesian model for differential gene expression analysis in microarray experiments.
BMC Bioinformatics. 2008;9 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-9-S1-S9.
7
Bayesian meta-analysis models for microarray data: a comparative study.
BMC Bioinformatics. 2007 Mar 7;8:80. doi: 10.1186/1471-2105-8-80.
8
An Exact Bayesian Model for Meta-Analysis of the Standardized Mean Difference with Its Simultaneous Credible Intervals.
Multivariate Behav Res. 2024 Sep-Oct;59(5):1058-1076. doi: 10.1080/00273171.2024.2358233. Epub 2024 Jul 23.
9
Causal Artificial Intelligence Models of Food Quality Data.
Food Technol Biotechnol. 2024 Mar;62(1):102-109. doi: 10.17113/ftb.62.01.24.8301.
10
Differential gene expression detection using penalized linear regression models: the improved SAM statistics.
Bioinformatics. 2005 Apr 15;21(8):1565-71. doi: 10.1093/bioinformatics/bti217. Epub 2004 Dec 14.

引用本文的文献

2
Transcriptional landscape of PTEN loss in primary prostate cancer.
BMC Cancer. 2021 Jul 26;21(1):856. doi: 10.1186/s12885-021-08593-y.
3
Meta-Analysis Based on Nonconvex Regularization.
Sci Rep. 2020 Apr 1;10(1):5755. doi: 10.1038/s41598-020-62473-2.
5
Identifying differentially expressed genes from cross-site integrated data based on relative expression orderings.
Int J Biol Sci. 2018 May 22;14(8):892-900. doi: 10.7150/ijbs.24548. eCollection 2018.
6
Biomarker detection and categorization in ribonucleic acid sequencing meta-analysis using Bayesian hierarchical models.
J R Stat Soc Ser C Appl Stat. 2017 Aug;66(4):847-867. doi: 10.1111/rssc.12199. Epub 2016 Dec 16.
7
A Joint Bayesian Model for Integrating Microarray and RNA Sequencing Transcriptomic Data.
J Comput Biol. 2017 Jul;24(7):647-662. doi: 10.1089/cmb.2017.0056. Epub 2017 May 25.
8
Integrative analyses of cancer data: a review from a statistical perspective.
Cancer Inform. 2015 May 14;14(Suppl 2):173-81. doi: 10.4137/CIN.S17303. eCollection 2015.
9
Expression analysis of all protease genes reveals cathepsin K to be overexpressed in glioblastoma.
PLoS One. 2014 Oct 30;9(10):e111819. doi: 10.1371/journal.pone.0111819. eCollection 2014.
10
Joint analysis of differential gene expression in multiple studies using correlation motifs.
Biostatistics. 2015 Jan;16(1):31-46. doi: 10.1093/biostatistics/kxu038. Epub 2014 Aug 19.

本文引用的文献

1
Merging two gene-expression studies via cross-platform normalization.
Bioinformatics. 2008 May 1;24(9):1154-60. doi: 10.1093/bioinformatics/btn083. Epub 2008 Mar 5.
2
Extended analysis of benchmark datasets for Agilent two-color microarrays.
BMC Bioinformatics. 2007 Oct 3;8:371. doi: 10.1186/1471-2105-8-371.
4
A Bayesian mixture model for metaanalysis of microarray studies.
Funct Integr Genomics. 2008 Feb;8(1):43-53. doi: 10.1007/s10142-007-0058-3. Epub 2007 Sep 19.
5
Cross-study validation and combined analysis of gene expression microarray data.
Biostatistics. 2008 Apr;9(2):333-54. doi: 10.1093/biostatistics/kxm033. Epub 2007 Sep 14.
6
Bayesian meta-analysis models for microarray data: a comparative study.
BMC Bioinformatics. 2007 Mar 7;8:80. doi: 10.1186/1471-2105-8-80.
10
Bayesian models for pooling microarray studies with multiple sources of replications.
BMC Bioinformatics. 2006 May 5;7:247. doi: 10.1186/1471-2105-7-247.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验