Kadota Koji, Nakai Yuji, Shimizu Kentaro
Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan.
Algorithms Mol Biol. 2009 Apr 22;4:7. doi: 10.1186/1748-7188-4-7.
To identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix GeneChip system need to select both a preprocessing algorithm to obtain expression-level measurements and a way of ranking genes to obtain the most plausible candidates. We recently recommended suitable combinations of a preprocessing algorithm and gene ranking method that can be used to identify DEGs with a higher level of sensitivity and specificity. However, in addition to these recommendations, researchers also want to know which combinations enhance reproducibility.
We compared eight conventional methods for ranking genes: weighted average difference (WAD), average difference (AD), fold change (FC), rank products (RP), moderated t statistic (modT), significance analysis of microarrays (samT), shrinkage t statistic (shrinkT), and intensity-based moderated t statistic (ibmT) with six preprocessing algorithms (PLIER, VSN, FARMS, multi-mgMOS (mmgMOS), MBEI, and GCRMA). A total of 36 real experimental datasets was evaluated on the basis of the area under the receiver operating characteristic curve (AUC) as a measure for both sensitivity and specificity. We found that the RP method performed well for VSN-, FARMS-, MBEI-, and GCRMA-preprocessed data, and the WAD method performed well for mmgMOS-preprocessed data. Our analysis of the MicroArray Quality Control (MAQC) project's datasets showed that the FC-based gene ranking methods (WAD, AD, FC, and RP) had a higher level of reproducibility: The percentages of overlapping genes (POGs) across different sites for the FC-based methods were higher overall than those for the t-statistic-based methods (modT, samT, shrinkT, and ibmT). In particular, POG values for WAD were the highest overall among the FC-based methods irrespective of the choice of preprocessing algorithm.
Our results demonstrate that to increase sensitivity, specificity, and reproducibility in microarray analyses, we need to select suitable combinations of preprocessing algorithms and gene ranking methods. We recommend the use of FC-based methods, in particular RP or WAD.
为了从微阵列数据中识别差异表达基因(DEG),Affymetrix基因芯片系统的用户需要选择一种预处理算法来获得表达水平测量值,以及一种对基因进行排名的方法来获得最合理的候选基因。我们最近推荐了预处理算法和基因排名方法的合适组合,可用于以更高的灵敏度和特异性识别DEG。然而,除了这些建议外,研究人员还想知道哪些组合能提高可重复性。
我们将八种传统的基因排名方法进行了比较:加权平均差(WAD)、平均差(AD)、倍数变化(FC)、秩乘积(RP)、适度t统计量(modT)、微阵列显著性分析(samT)、收缩t统计量(shrinkT)和基于强度的适度t统计量(ibmT),并与六种预处理算法(PLIER、VSN、FARMS、多mgMOS(mmgMOS)、MBEI和GCRMA)进行比较。基于接收器操作特征曲线(AUC)下的面积,对总共36个真实实验数据集进行了评估,以此作为灵敏度和特异性的度量。我们发现,RP方法对VSN、FARMS、MBEI和GCRMA预处理的数据表现良好,而WAD方法对mmgMOS预处理的数据表现良好。我们对微阵列质量控制(MAQC)项目数据集的分析表明,基于FC的基因排名方法(WAD、AD、FC和RP)具有更高的可重复性:基于FC的方法在不同位点的重叠基因百分比(POG)总体上高于基于t统计量的方法(modT、samT、shrinkT和ibmT)。特别是,无论选择何种预处理算法,WAD的POG值在基于FC的方法中总体上是最高的。
我们的结果表明,为了提高微阵列分析的灵敏度、特异性和可重复性,我们需要选择预处理算法和基因排名方法的合适组合。我们建议使用基于FC的方法,特别是RP或WAD。