通过空间相关混合模型将基因网络纳入基因组数据的统计测试。

Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model.

作者信息

Wei Peng, Pan Wei

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building (MMC 303), Minneapolis, MN 55455-0378, USA.

出版信息

Bioinformatics. 2008 Feb 1;24(3):404-11. doi: 10.1093/bioinformatics/btm612. Epub 2007 Dec 14.

DOI:10.1093/bioinformatics/btm612

PMID:18083717

Abstract

MOTIVATION

It is a common task in genomic studies to identify a subset of the genes satisfying certain conditions, such as differentially expressed genes or regulatory target genes of a transcription factor (TF). This can be formulated as a statistical hypothesis testing problem. Most existing approaches treat the genes as having an identical and independent distribution a priori, testing each gene independently or testing some subsets of the genes one by one. On the other hand, it is known that the genes work coordinately as dictated by gene networks. Treating genes equally and independently ignores the important information contained in gene networks, leading to inefficient analysis and reduced power.

RESULTS

We propose incorporating gene network information into statistical analysis of genomic data. Specifically, rather than treating the genes equally and independently a priori in a standard mixture model, we assume that gene-specific prior probabilities are correlated as induced by a gene network: while the genes are allowed to have different prior probabilities, those neighboring ones in the network have similar prior probabilities, reflecting their shared biological functions. We applied the two approaches to a real ChIP-chip dataset (and simulated data) to identify the transcriptional target genes of TF GCN4. The new method was found to be more powerful in discovering the target genes.

摘要

动机

在基因组研究中，识别满足特定条件的基因子集是一项常见任务，例如差异表达基因或转录因子（TF）的调控靶基因。这可以被表述为一个统计假设检验问题。大多数现有方法先验地将基因视为具有相同且独立的分布，独立地检验每个基因或逐个检验基因的某些子集。另一方面，已知基因在基因网络的支配下协同工作。将基因平等且独立地对待会忽略基因网络中包含的重要信息，导致分析效率低下和功效降低。

结果

我们建议将基因网络信息纳入基因组数据的统计分析中。具体而言，在标准混合模型中，我们不是先验地将基因平等且独立地对待，而是假设基因特异性先验概率由基因网络诱导产生相关性：虽然允许基因具有不同的先验概率，但网络中相邻的基因具有相似的先验概率，这反映了它们共享的生物学功能。我们将这两种方法应用于一个真实的芯片数据集（以及模拟数据），以识别TF GCN4的转录靶基因。发现新方法在发现靶基因方面更具功效。

相似文献

Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model.

Bioinformatics. 2008 Feb 1;24(3):404-11. doi: 10.1093/bioinformatics/btm612. Epub 2007 Dec 14.

Transcriptome network component analysis with limited microarray data.

Bioinformatics. 2006 Aug 1;22(15):1886-94. doi: 10.1093/bioinformatics/btl279. Epub 2006 Jun 9.

Construction of a reference gene association network from multiple profiling data: application to data analysis.

Bioinformatics. 2007 Oct 15;23(20):2716-24. doi: 10.1093/bioinformatics/btm423. Epub 2007 Sep 10.

Inferring gene regulatory networks from multiple microarray datasets.

Bioinformatics. 2006 Oct 1;22(19):2413-20. doi: 10.1093/bioinformatics/btl396. Epub 2006 Jul 24.

Tail posterior probability for inference in pairwise and multiclass gene expression data.

Biometrics. 2007 Dec;63(4):1117-25. doi: 10.1111/j.1541-0420.2007.00807.x.

Time-varying modeling of gene expression regulatory networks using the wavelet dynamic vector autoregressive method.

Bioinformatics. 2007 Jul 1;23(13):1623-30. doi: 10.1093/bioinformatics/btm151. Epub 2007 Apr 26.

Incorporating gene functions as priors in model-based clustering of microarray gene expression data.

Bioinformatics. 2006 Apr 1;22(7):795-801. doi: 10.1093/bioinformatics/btl011. Epub 2006 Jan 24.

Comparing association network algorithms for reverse engineering of large-scale gene regulatory networks: synthetic versus real data.

Bioinformatics. 2007 Jul 1;23(13):1640-7. doi: 10.1093/bioinformatics/btm163. Epub 2007 May 7.

A Gibbs sampler for the identification of gene expression and network connectivity consistency.

Bioinformatics. 2006 Dec 15;22(24):3040-6. doi: 10.1093/bioinformatics/btl541. Epub 2006 Oct 23.

MMG: a probabilistic tool to identify submodules of metabolic pathways.

Bioinformatics. 2008 Apr 15;24(8):1078-84. doi: 10.1093/bioinformatics/btn066. Epub 2008 Feb 21.

引用本文的文献

Bioinformatics analysis combined with untargeted metabolomics reveals lipid metabolism-related genes and their biological markers in chronic spontaneous urticaria.

Front Genet. 2025 Aug 18;16:1550205. doi: 10.3389/fgene.2025.1550205. eCollection 2025.

A comparative study of statistical methods for identifying differentially expressed genes in spatial transcriptomics.

bioRxiv. 2025 Feb 22:2025.02.17.638726. doi: 10.1101/2025.02.17.638726.

Feature selection and classification over the network with missing node observations.

Stat Med. 2022 Mar 30;41(7):1242-1262. doi: 10.1002/sim.9267. Epub 2021 Nov 23.

IMIX: a multivariate mixture model approach to association analysis through multi-omics data integration.

Bioinformatics. 2021 Apr 1;36(22-23):5439-5447. doi: 10.1093/bioinformatics/btaa1001.

Integrating gene regulatory pathways into differential network analysis of gene expression data.

Sci Rep. 2019 Apr 2;9(1):5479. doi: 10.1038/s41598-019-41918-3.

A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data.

Bioinformatics. 2018 Nov 1;34(21):3727-3737. doi: 10.1093/bioinformatics/bty429.

Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors.

BMC Bioinformatics. 2017 Dec 28;18(Suppl 14):552. doi: 10.1186/s12859-017-1893-4.

F-MAP: A Bayesian approach to infer the gene regulatory network using external hints.

PLoS One. 2017 Sep 22;12(9):e0184795. doi: 10.1371/journal.pone.0184795. eCollection 2017.

Incorporating interaction networks into the determination of functionally related hit genes in genomic experiments with Markov random fields.

Bioinformatics. 2017 Jul 15;33(14):i170-i179. doi: 10.1093/bioinformatics/btx244.

Enhanced construction of gene regulatory networks using hub gene information.

BMC Bioinformatics. 2017 Mar 23;18(1):186. doi: 10.1186/s12859-017-1576-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过空间相关混合模型将基因网络纳入基因组数据的统计测试。

Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献