Fox Richard J, Dimmic Matthew W
Codexis, Inc., Redwood City, CA 94063, USA.
BMC Bioinformatics. 2006 Mar 10;7:126. doi: 10.1186/1471-2105-7-126.
Determining whether a gene is differentially expressed in two different samples remains an important statistical problem. Prior work in this area has featured the use of t-tests with pooled estimates of the sample variance based on similarly expressed genes. These methods do not display consistent behavior across the entire range of pooling and can be biased when the prior hyperparameters are specified heuristically.
A two-sample Bayesian t-test is proposed for use in determining whether a gene is differentially expressed in two different samples. The test method is an extension of earlier work that made use of point estimates for the variance. The method proposed here explicitly calculates in analytic form the marginal distribution for the difference in the mean expression of two samples, obviating the need for point estimates of the variance without recourse to posterior simulation. The prior distribution involves a single hyperparameter that can be calculated in a statistically rigorous manner, making clear the connection between the prior degrees of freedom and prior variance.
The test is easy to understand and implement and application to both real and simulated data shows that the method has equal or greater power compared to the previous method and demonstrates consistent Type I error rates. The test is generally applicable outside the microarray field to any situation where prior information about the variance is available and is not limited to cases where estimates of the variance are based on many similar observations.
确定一个基因在两个不同样本中是否差异表达仍然是一个重要的统计学问题。该领域先前的工作主要是使用基于相似表达基因的样本方差合并估计的t检验。这些方法在整个合并范围内没有表现出一致的行为,并且当先验超参数通过启发式指定时可能会产生偏差。
提出了一种两样本贝叶斯t检验,用于确定一个基因在两个不同样本中是否差异表达。该检验方法是早期利用方差点估计的工作的扩展。这里提出的方法以解析形式明确计算了两个样本平均表达差异的边际分布,无需通过后验模拟对方差进行点估计。先验分布涉及一个可以通过统计严格方式计算的单一超参数,明确了先验自由度和先验方差之间的联系。
该检验易于理解和实施,对真实数据和模拟数据的应用表明,该方法与先前方法相比具有相同或更高的检验效能,并显示出一致的I型错误率。该检验通常适用于微阵列领域之外的任何情况,即只要有关于方差的先验信息,并且不限于方差估计基于许多相似观察值的情况。