Deng Xutao, Xu Jun, Hui James, Wang Charles
Transcriptional Genomics Core, Cedars-Sinai Medical Center, David Geffen School of Medicine at UCLA, Los Angeles, CA 90048, USA.
Comput Methods Programs Biomed. 2009 Feb;93(2):124-39. doi: 10.1016/j.cmpb.2008.07.013. Epub 2008 Oct 7.
Identifying genes that are differentially expressed under different experimental conditions is a fundamental task in microarray studies. However, different ranking methods generate very different gene lists, and this could profoundly impact follow-up analyses and biological interpretation. Therefore, developing improved ranking methods are critical in microarray data analysis. We developed a new algorithm, the probabilistic fold change (PFC), which ranks genes based on a confidence interval estimate of fold change. We performed extensive testing using multiple benchmark data sources including the MicroArray Quality Control (MAQC) data sets. We corroborated our observations with MAQC data sets using qRT-PCR data sets and Latin square spike-in data sets. Along with PFC, we tested six other popular ranking algorithms including Mean Fold Change (FC), SAM, t-statistic (T), Bayesian-t (BAYT), Intensity-Conditional Fold Change (CFC), and Rank Product (RP). PFC achieved reproducibility and accuracy that are consistently among the best of the seven ranking algorithms while other ranking algorithms would show weakness in some cases. Contrary to common belief, our results demonstrated that statistical accuracy will not translate to biological reproducibility and therefore both quality aspects need to be evaluated.
识别在不同实验条件下差异表达的基因是微阵列研究中的一项基本任务。然而,不同的排名方法会产生非常不同的基因列表,这可能会对后续分析和生物学解释产生深远影响。因此,开发改进的排名方法在微阵列数据分析中至关重要。我们开发了一种新算法,概率倍数变化(PFC),它基于倍数变化的置信区间估计对基因进行排名。我们使用包括微阵列质量控制(MAQC)数据集在内的多个基准数据源进行了广泛测试。我们使用qRT-PCR数据集和拉丁方掺入数据集,通过MAQC数据集证实了我们的观察结果。除了PFC,我们还测试了其他六种流行的排名算法,包括平均倍数变化(FC)、SAM、t统计量(T)、贝叶斯t(BAYT)、强度条件倍数变化(CFC)和排名乘积(RP)。PFC在七种排名算法中始终具有最佳的可重复性和准确性,而其他排名算法在某些情况下会表现出弱点。与普遍看法相反,我们的结果表明,统计准确性并不等同于生物学可重复性,因此这两个质量方面都需要评估。