Barash Yoseph, Dehan Elinor, Krupsky Meir, Franklin Wilbur, Geraci Marc, Friedman Nir, Kaminski Naftali
School of Computer Science and Engineering, Hebrew University, Jerusalem, 91904, Israel.
Bioinformatics. 2004 Apr 12;20(6):839-46. doi: 10.1093/bioinformatics/btg487. Epub 2004 Jan 29.
Recent years' exponential increase in DNA microarrays experiments has motivated the development of many signal quantitation (SQ) algorithms. These algorithms perform various transformations on the actual measurements aimed to enable researchers to compare readings of different genes quantitatively within one experiment and across separate experiments. However, it is relatively unclear whether there is a 'best' algorithm to quantitate microarray data. The ability to compare and assess such algorithms is crucial for any downstream analysis. In this work, we suggest a methodology for comparing different signal quantitation algorithms for gene expression data. Our aim is to enable researchers to compare the effect of different SQ algorithms on the specific dataset they are dealing with. We combine two kinds of tests to assess the effect of an SQ algorithm in terms of signal to noise ratio. To assess noise, we exploit redundancy within the experimental dataset to test the variability of a given SQ algorithm output. For the effect of the SQ on the signal we evaluate the overabundance of differentially expressed genes using various statistical significance tests.
We demonstrate our analysis approach with three SQ algorithms for oligonucleotide microarrays. We compare the results of using the dChip software and the RMAExpress software to the ones obtained by using the standard Affymetrix MAS5 on a dataset containing pairs of repeated hybridizations. Our analysis suggests that dChip is more robust and stable than the MAS5 tools for about 60% of the genes while RMAExpress is able to achieve an even greater improvement in terms of signal to noise, for more than 95% of the genes.
近年来DNA微阵列实验呈指数级增长,推动了许多信号定量(SQ)算法的发展。这些算法对实际测量值进行各种变换,目的是使研究人员能够在一个实验内以及跨不同实验定量比较不同基因的读数。然而,相对不清楚的是是否存在一种“最佳”算法来定量微阵列数据。比较和评估此类算法的能力对于任何下游分析都至关重要。在这项工作中,我们提出了一种用于比较基因表达数据不同信号定量算法的方法。我们的目的是使研究人员能够比较不同SQ算法对他们所处理的特定数据集的影响。我们结合两种测试来根据信噪比评估SQ算法的效果。为了评估噪声,我们利用实验数据集中的冗余来测试给定SQ算法输出的可变性。对于SQ对信号的影响,我们使用各种统计显著性检验来评估差异表达基因的丰度。
我们用三种用于寡核苷酸微阵列的SQ算法展示了我们的分析方法。我们将使用dChip软件和RMAExpress软件得到的结果与在包含重复杂交对的数据集上使用标准Affymetrix MAS5得到的结果进行比较。我们的分析表明,对于约60%的基因,dChip比MAS5工具更稳健和稳定,而对于超过95%的基因,RMAExpress在信噪比方面能够实现更大的改进。