Gurunathan Rajalakshmi, Van Emden Bernard, Panchanathan Sethuraman, Kumar Sudhir
Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA.
BMC Bioinformatics. 2004 Dec 16;5:202. doi: 10.1186/1471-2105-5-202.
Modern developmental biology relies heavily on the analysis of embryonic gene expression patterns. Investigators manually inspect hundreds or thousands of expression patterns to identify those that are spatially similar and to ultimately infer potential gene interactions. However, the rapid accumulation of gene expression pattern data over the last two decades, facilitated by high-throughput techniques, has produced a need for the development of efficient approaches for direct comparison of images, rather than their textual descriptions, to identify spatially similar expression patterns.
The effectiveness of the Binary Feature Vector (BFV) and Invariant Moment Vector (IMV) based digital representations of the gene expression patterns in finding biologically meaningful patterns was compared for a small (226 images) and a large (1819 images) dataset. For each dataset, an ordered list of images, with respect to a query image, was generated to identify overlapping and similar gene expression patterns, in a manner comparable to what a developmental biologist might do. The results showed that the BFV representation consistently outperforms the IMV representation in finding biologically meaningful matches when spatial overlap of the gene expression pattern and the genes involved are considered. Furthermore, we explored the value of conducting image-content based searches in a dataset where individual expression components (or domains) of multi-domain expression patterns were also included separately. We found that this technique improves performance of both IMV and BFV based searches.
We conclude that the BFV representation consistently produces a more extensive and better list of biologically useful patterns than the IMV representation. The high quality of results obtained scales well as the search database becomes larger, which encourages efforts to build automated image query and retrieval systems for spatial gene expression patterns.
现代发育生物学严重依赖于胚胎基因表达模式的分析。研究人员手动检查数百或数千种表达模式,以识别那些在空间上相似的模式,并最终推断潜在的基因相互作用。然而,在过去二十年中,由于高通量技术的推动,基因表达模式数据迅速积累,这就需要开发有效的方法来直接比较图像(而非其文本描述),以识别空间上相似的表达模式。
针对一个小数据集(226张图像)和一个大数据集(1819张图像),比较了基于二元特征向量(BFV)和不变矩向量(IMV)的基因表达模式数字表示法在寻找具有生物学意义的模式方面的有效性。对于每个数据集,生成了一个相对于查询图像的图像有序列表,以识别重叠和相似的基因表达模式,其方式类似于发育生物学家可能采取的做法。结果表明,当考虑基因表达模式和相关基因的空间重叠时,BFV表示法在寻找具有生物学意义的匹配方面始终优于IMV表示法。此外,我们还探索了在一个数据集中进行基于图像内容搜索的价值,该数据集中还单独包含了多结构域表达模式的各个表达成分(或结构域)。我们发现,这种技术提高了基于IMV和BFV搜索的性能。
我们得出结论,BFV表示法始终能生成比IMV表示法更广泛、更好的具有生物学用途的模式列表。随着搜索数据库变大,所获得结果的高质量也能很好地扩展,这鼓励人们努力构建用于空间基因表达模式的自动图像查询和检索系统。