Suppr超能文献

概率倍数变化:一种用于识别差异表达基因列表的稳健计算方法。

Probability fold change: a robust computational approach for identifying differentially expressed gene lists.

作者信息

Deng Xutao, Xu Jun, Hui James, Wang Charles

机构信息

Transcriptional Genomics Core, Cedars-Sinai Medical Center, David Geffen School of Medicine at UCLA, Los Angeles, CA 90048, USA.

出版信息

Comput Methods Programs Biomed. 2009 Feb;93(2):124-39. doi: 10.1016/j.cmpb.2008.07.013. Epub 2008 Oct 7.

Abstract

Identifying genes that are differentially expressed under different experimental conditions is a fundamental task in microarray studies. However, different ranking methods generate very different gene lists, and this could profoundly impact follow-up analyses and biological interpretation. Therefore, developing improved ranking methods are critical in microarray data analysis. We developed a new algorithm, the probabilistic fold change (PFC), which ranks genes based on a confidence interval estimate of fold change. We performed extensive testing using multiple benchmark data sources including the MicroArray Quality Control (MAQC) data sets. We corroborated our observations with MAQC data sets using qRT-PCR data sets and Latin square spike-in data sets. Along with PFC, we tested six other popular ranking algorithms including Mean Fold Change (FC), SAM, t-statistic (T), Bayesian-t (BAYT), Intensity-Conditional Fold Change (CFC), and Rank Product (RP). PFC achieved reproducibility and accuracy that are consistently among the best of the seven ranking algorithms while other ranking algorithms would show weakness in some cases. Contrary to common belief, our results demonstrated that statistical accuracy will not translate to biological reproducibility and therefore both quality aspects need to be evaluated.

摘要

识别在不同实验条件下差异表达的基因是微阵列研究中的一项基本任务。然而,不同的排名方法会产生非常不同的基因列表,这可能会对后续分析和生物学解释产生深远影响。因此,开发改进的排名方法在微阵列数据分析中至关重要。我们开发了一种新算法,概率倍数变化(PFC),它基于倍数变化的置信区间估计对基因进行排名。我们使用包括微阵列质量控制(MAQC)数据集在内的多个基准数据源进行了广泛测试。我们使用qRT-PCR数据集和拉丁方掺入数据集,通过MAQC数据集证实了我们的观察结果。除了PFC,我们还测试了其他六种流行的排名算法,包括平均倍数变化(FC)、SAM、t统计量(T)、贝叶斯t(BAYT)、强度条件倍数变化(CFC)和排名乘积(RP)。PFC在七种排名算法中始终具有最佳的可重复性和准确性,而其他排名算法在某些情况下会表现出弱点。与普遍看法相反,我们的结果表明,统计准确性并不等同于生物学可重复性,因此这两个质量方面都需要评估。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验