在经验贝叶斯方法中纳入生物信息作为先验信息来分析微阵列数据。

Incorporating biological information as a prior in an empirical bayes approach to analyzing microarray data.

作者信息

Pan Wei

机构信息

University of Minnesota, USA.

出版信息

Stat Appl Genet Mol Biol. 2005;4:Article12. doi: 10.2202/1544-6115.1124. Epub 2005 May 25.

DOI:10.2202/1544-6115.1124

PMID:16646829

Abstract

Currently the practice of using existing biological knowledge in analyzing high throughput genomic and proteomic data is mainly for the purpose of validations. Here we take a different approach of incorporating biological knowledge into statistical analysis to improve statistical power and efficiency. Specifically, we consider how to fuse biological information into a mixture model to analyze microarray data. In contrast to a standard mixture model where it is assumed that all the genes come from the same (marginal) distribution, including an equal prior probability of having an event, such as having differential expression or being bound by a transcription factor (TF), our proposed mixture model allows the genes in different groups to have different distributions while the grouping of the genes reflects biological information. Using a list of about 800 putative cell cycle-regulated genes as prior biological knowledge, we analyze a genome-wide location data to detect binding sites of TF Fkh1. We find that our proposal improves over the standard approach, resulting in reduced false discovery rates (FDR), and hence it is a useful alternative to the current practice.

摘要

目前，在分析高通量基因组和蛋白质组数据时运用现有生物学知识的做法主要是为了进行验证。在此，我们采用一种不同的方法，即将生物学知识纳入统计分析，以提高统计效力和效率。具体而言，我们考虑如何将生物学信息融入混合模型来分析微阵列数据。与标准混合模型不同，标准混合模型假定所有基因都来自相同的（边际）分布，包括发生某事件（如具有差异表达或被转录因子（TF）结合）的相等先验概率，而我们提出的混合模型允许不同组中的基因具有不同分布，同时基因的分组反映生物学信息。利用一份约800个推定的细胞周期调控基因列表作为先验生物学知识，我们分析了全基因组定位数据以检测TF Fkh1的结合位点。我们发现我们的方法比标准方法有所改进，从而降低了错误发现率（FDR），因此它是当前做法的一种有用替代方法。