School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK.
BMC Bioinformatics. 2009 Sep 23;10:310. doi: 10.1186/1471-2105-10-310.
The Audic-Claverie method 1 has been and still continues to be a popular approach for detection of differentially expressed genes in the SAGE framework. The method is based on the assumption that under the null hypothesis tag counts of the same gene in two libraries come from the same but unknown Poisson distribution. The problem is that each SAGE library represents only a single measurement. We ask: Given that the tag count samples from SAGE libraries are extremely limited, how useful actually is the Audic-Claverie methodology? We rigorously analyze the A-C statistic that forms a backbone of the methodology and represents our knowledge of the underlying tag generating process based on one observation.
We show that the A-C statistic and the underlying Poisson distribution of the tag counts share the same mode structure. Moreover, the K-L divergence from the true unknown Poisson distribution to the A-C statistic is minimized when the A-C statistic is conditioned on the mode of the Poisson distribution. Most importantly, the expectation of this K-L divergence never exceeds 1/2 bit.
A rigorous underpinning of the Audic-Claverie methodology has been missing. Our results constitute a rigorous argument supporting the use of Audic-Claverie method even though the SAGE libraries represent very sparse samples.
Audic-Claverie 方法 1 一直是并且仍然是 SAGE 框架中检测差异表达基因的流行方法。该方法基于这样的假设,即在零假设下,两个库中同一基因的标签计数来自相同但未知的泊松分布。问题是每个 SAGE 文库只代表一个单一的测量。我们问:鉴于 SAGE 文库中的标签计数样本非常有限,Audic-Claverie 方法实际上有多有用?我们严格分析了构成该方法基础的 A-C 统计量,该统计量基于一次观察代表了我们对潜在标签生成过程的了解。
我们表明,A-C 统计量和标签计数的基础泊松分布具有相同的模式结构。此外,当 A-C 统计量根据泊松分布的模式进行条件处理时,与真实未知泊松分布的 K-L 散度最小化。最重要的是,这个 K-L 散度的期望从不超过 1/2 位。
Audic-Claverie 方法的严格基础一直缺失。我们的结果构成了一个严格的论据,支持即使 SAGE 文库代表非常稀疏的样本,也可以使用 Audic-Claverie 方法。