Li Qunhua, Fraley Chris, Bumgarner Roger E, Yeung Ka Yee, Raftery Adrian E
Department of Statistics, Box 354322 University of Washington, Seattle, WA 98195, USA.
Bioinformatics. 2005 Jun 15;21(12):2875-82. doi: 10.1093/bioinformatics/bti447. Epub 2005 Apr 21.
Inner holes, artifacts and blank spots are common in microarray images, but current image analysis methods do not pay them enough attention. We propose a new robust model-based method for processing microarray images so as to estimate foreground and background intensities. The method starts with a very simple but effective automatic gridding method, and then proceeds in two steps. The first step applies model-based clustering to the distribution of pixel intensities, using the Bayesian Information Criterion (BIC) to choose the number of groups up to a maximum of three. The second step is spatial, finding the large spatially connected components in each cluster of pixels. The method thus combines the strengths of the histogram-based and spatial approaches. It deals effectively with inner holes in spots and with artifacts. It also provides a formal inferential basis for deciding when the spot is blank, namely when the BIC favors one group over two or three.
We apply our methods for gridding and segmentation to cDNA microarray images from an HIV infection experiment. In these experiments, our method had better stability across replicates than a fixed-circle segmentation method or the seeded region growing method in the SPOT software, without introducing noticeable bias when estimating the intensities of differentially expressed genes.
spotSegmentation, an R language package implementing both the gridding and segmentation methods is available through the Bioconductor project (http://www.bioconductor.org). The segmentation method requires the contributed R package MCLUST for model-based clustering (http://cran.us.r-project.org).
内部空洞、伪影和空白点在微阵列图像中很常见,但当前的图像分析方法对它们关注不足。我们提出了一种基于稳健模型的新方法来处理微阵列图像,以估计前景和背景强度。该方法从一个非常简单但有效的自动网格化方法开始,然后分两步进行。第一步对像素强度分布应用基于模型的聚类,使用贝叶斯信息准则(BIC)选择最多三个组的数量。第二步是空间分析,在每个像素聚类中找到大的空间连接组件。该方法结合了基于直方图和空间方法的优点。它有效地处理了斑点中的内部空洞和伪影。它还为确定斑点何时为空提供了正式的推理基础,即当BIC支持一组而不是两组或三组时。
我们将网格化和分割方法应用于来自HIV感染实验的cDNA微阵列图像。在这些实验中,我们的方法在重复实验中比固定圆分割方法或SPOT软件中的种子区域生长方法具有更好的稳定性,在估计差异表达基因的强度时不会引入明显偏差。
通过Bioconductor项目(http://www.bioconductor.org)可以获得实现网格化和分割方法的R语言包spotSegmentation。分割方法需要贡献的R包MCLUST进行基于模型的聚类(http://cran.us.r-project.org)。