Computational Biology and Bioinformatics Graduate Program, Duke University, Durham, NC 27708, USA.
Bioinformatics. 2010 Mar 15;26(6):761-9. doi: 10.1093/bioinformatics/btp658. Epub 2009 Nov 26.
Recent advancements in high-throughput imaging have created new large datasets with tens of thousands of gene expression images. Methods for capturing these spatial and/or temporal expression patterns include in situ hybridization or fluorescent reporter constructs or tags, and results are still frequently assessed by subjective qualitative comparisons. In order to deal with available large datasets, fully automated analysis methods must be developed to properly normalize and model spatial expression patterns.
We have developed image segmentation and registration methods to identify and extract spatial gene expression patterns from RNA in situ hybridization experiments of Drosophila embryos. These methods allow us to normalize and extract expression information for 78,621 images from 3724 genes across six time stages. The similarity between gene expression patterns is computed using four scoring metrics: mean squared error, Haar wavelet distance, mutual information and spatial mutual information (SMI). We additionally propose a strategy to calculate the significance of the similarity between two expression images, by generating surrogate datasets with similar spatial expression patterns using a Monte Carlo swap sampler. On data from an early development time stage, we show that SMI provides the most biologically relevant metric of comparison, and that our significance testing generalizes metrics to achieve similar performance. We exemplify the application of spatial metrics on the well-known Drosophila segmentation network.
A Java webstart application to register and compare patterns, as well as all source code, are available from: http://tools.genome.duke.edu/generegulation/image_analysis/insitu
Supplementary data are available at Bioinformatics online.
高通量成像技术的最新进展产生了包含数万张基因表达图像的新的大型数据集。捕捉这些空间和/或时间表达模式的方法包括原位杂交或荧光报告构建体或标记,并且结果仍然经常通过主观定性比较来评估。为了处理可用的大型数据集,必须开发全自动分析方法来正确归一化和建模空间表达模式。
我们已经开发了图像分割和配准方法,以从果蝇胚胎的 RNA 原位杂交实验中识别和提取空间基因表达模式。这些方法允许我们从 6 个时间阶段的 3724 个基因中归一化和提取 78621 张图像的表达信息。使用四种评分指标计算基因表达模式之间的相似性:均方误差、哈尔小波距离、互信息和空间互信息(SMI)。我们还提出了一种策略,通过使用蒙特卡罗交换采样器生成具有相似空间表达模式的替代数据集,来计算两个表达图像之间相似性的显著性。在来自早期发育时间阶段的数据上,我们表明 SMI 提供了最具生物学相关性的比较指标,并且我们的显著性检验将指标推广到了实现类似性能的程度。我们在著名的果蝇分割网络上举例说明了空间指标的应用。
一个用于注册和比较模式的 Java Webstart 应用程序,以及所有源代码,可从以下网址获得:http://tools.genome.duke.edu/generegulation/image_analysis/insitu
补充数据可在生物信息学在线获得。