Bengtsson Anders, Bengtsson Henrik
Mathematical Statistics, Centre for Mathematical Sciences, Lund University, Box 118, SE-221 00 Lund, Sweden.
BMC Bioinformatics. 2006 Feb 28;7:96. doi: 10.1186/1471-2105-7-96.
In a microarray experiment the difference in expression between genes on the same slide is up to 103 fold or more. At low expression, even a small error in the estimate will have great influence on the final test and reference ratios. In addition to the true spot intensity the scanned signal consists of different kinds of noise referred to as background. In order to assess the true spot intensity background must be subtracted. The standard approach to estimate background intensities is to assume they are equal to the intensity levels between spots. In the literature, morphological opening is suggested to be one of the best methods for estimating background this way.
This paper examines fundamental properties of rank and quantile filters, which include morphological filters at the extremes, with focus on their ability to estimate between-spot intensity levels. The bias and variance of these filter estimates are driven by the number of background pixels used and their distributions. A new rank-filter algorithm is implemented and compared to methods available in Spot by CSIRO and GenePix Pro by Axon Instruments. Spot's morphological opening has a mean bias between -47 and -248 compared to a bias between 2 and -2 for the rank filter and the variability of the morphological opening estimate is 3 times higher than for the rank filter. The mean bias of Spot's second method, morph.close.open, is between -5 and -16 and the variability is approximately the same as for morphological opening. The variability of GenePix Pro's region-based estimate is more than ten times higher than the variability of the rank-filter estimate and with slightly more bias. The large variability is because the size of the background window changes with spot size. To overcome this, a non-adaptive region-based method is implemented. Its bias and variability are comparable to that of the rank filter.
The performance of more advanced rank filters is equal to the best region-based methods. However, in order to get unbiased estimates these filters have to be implemented with great care. The performance of morphological opening is in general poor with a substantial spatial-dependent bias.
在微阵列实验中,同一张载玻片上基因之间的表达差异可达103倍甚至更多。在低表达水平时,即使估计中的一个小误差也会对最终的测试和参考比率产生很大影响。除了真实的斑点强度外,扫描信号还包含各种被称为背景的噪声。为了评估真实的斑点强度,必须减去背景。估计背景强度的标准方法是假设它们等于斑点之间的强度水平。在文献中,形态学开运算被认为是用这种方法估计背景的最佳方法之一。
本文研究了秩滤波器和分位数滤波器的基本特性,其中包括极端情况下的形态学滤波器,重点关注它们估计斑点间强度水平的能力。这些滤波器估计的偏差和方差由所使用的背景像素数量及其分布驱动。实现了一种新的秩滤波器算法,并将其与CSIRO公司的Spot软件和Axon Instruments公司的GenePix Pro软件中的可用方法进行了比较。与秩滤波器的偏差在2到 -2之间相比,Spot软件的形态学开运算的平均偏差在 -47到 -248之间,并且形态学开运算估计的变异性比秩滤波器高3倍。Spot软件的第二种方法,即形态学闭运算后再开运算,其平均偏差在 -5到 -16之间,变异性与形态学开运算大致相同。GenePix Pro软件基于区域的估计的变异性比秩滤波器估计的变异性高十多倍,并且偏差稍大。变异性大是因为背景窗口的大小随斑点大小而变化。为了克服这一点,实现了一种非自适应的基于区域的方法。其偏差和变异性与秩滤波器相当。
更先进的秩滤波器的性能与最佳的基于区域的方法相当。然而,为了获得无偏差的估计,必须非常小心地实现这些滤波器。形态学开运算的性能总体较差,存在显著的空间依赖性偏差。