Suppr超能文献

通过空间相关混合模型将基因网络纳入基因组数据的统计测试。

Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model.

作者信息

Wei Peng, Pan Wei

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building (MMC 303), Minneapolis, MN 55455-0378, USA.

出版信息

Bioinformatics. 2008 Feb 1;24(3):404-11. doi: 10.1093/bioinformatics/btm612. Epub 2007 Dec 14.

Abstract

MOTIVATION

It is a common task in genomic studies to identify a subset of the genes satisfying certain conditions, such as differentially expressed genes or regulatory target genes of a transcription factor (TF). This can be formulated as a statistical hypothesis testing problem. Most existing approaches treat the genes as having an identical and independent distribution a priori, testing each gene independently or testing some subsets of the genes one by one. On the other hand, it is known that the genes work coordinately as dictated by gene networks. Treating genes equally and independently ignores the important information contained in gene networks, leading to inefficient analysis and reduced power.

RESULTS

We propose incorporating gene network information into statistical analysis of genomic data. Specifically, rather than treating the genes equally and independently a priori in a standard mixture model, we assume that gene-specific prior probabilities are correlated as induced by a gene network: while the genes are allowed to have different prior probabilities, those neighboring ones in the network have similar prior probabilities, reflecting their shared biological functions. We applied the two approaches to a real ChIP-chip dataset (and simulated data) to identify the transcriptional target genes of TF GCN4. The new method was found to be more powerful in discovering the target genes.

摘要

动机

在基因组研究中,识别满足特定条件的基因子集是一项常见任务,例如差异表达基因或转录因子(TF)的调控靶基因。这可以被表述为一个统计假设检验问题。大多数现有方法先验地将基因视为具有相同且独立的分布,独立地检验每个基因或逐个检验基因的某些子集。另一方面,已知基因在基因网络的支配下协同工作。将基因平等且独立地对待会忽略基因网络中包含的重要信息,导致分析效率低下和功效降低。

结果

我们建议将基因网络信息纳入基因组数据的统计分析中。具体而言,在标准混合模型中,我们不是先验地将基因平等且独立地对待,而是假设基因特异性先验概率由基因网络诱导产生相关性:虽然允许基因具有不同的先验概率,但网络中相邻的基因具有相似的先验概率,这反映了它们共享的生物学功能。我们将这两种方法应用于一个真实的芯片数据集(以及模拟数据),以识别TF GCN4的转录靶基因。发现新方法在发现靶基因方面更具功效。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验