Qian Yuntao, Murphy Robert F
Center for Bioimage Informatics and Machine Learning Department, Carnegie Mellon University, Pittsburgh, USA.
Bioinformatics. 2008 Feb 15;24(4):569-76. doi: 10.1093/bioinformatics/btm561. Epub 2007 Nov 22.
There is extensive interest in automating the collection, organization and analysis of biological data. Data in the form of images in online literature present special challenges for such efforts. The first steps in understanding the contents of a figure are decomposing it into panels and determining the type of each panel. In biological literature, panel types include many kinds of images collected by different techniques, such as photographs of gels or images from microscopes. We have previously described the SLIF system (http://slif.cbi.cmu.edu) that identifies panels containing fluorescence microscope images among figures in online journal articles as a prelude to further analysis of the subcellular patterns in such images. This system contains a pretrained classifier that uses image features to assign a type (class) to each separate panel. However, the types of panels in a figure are often correlated, so that we can consider the class of a panel to be dependent not only on its own features but also on the types of the other panels in a figure.
In this article, we introduce the use of a type of probabilistic graphical model, a factor graph, to represent the structured information about the images in a figure, and permit more robust and accurate inference about their types. We obtain significant improvement over results for considering panels separately.
The code and data used for the experiments described here are available from http://murphylab.web.cmu.edu/software.
人们对生物数据的收集、整理和分析自动化有着广泛的兴趣。在线文献中以图像形式呈现的数据给此类工作带来了特殊挑战。理解一幅图内容的第一步是将其分解为各个图块,并确定每个图块的类型。在生物文献中,图块类型包括通过不同技术收集的多种图像,例如凝胶照片或显微镜图像。我们之前描述了SLIF系统(http://slif.cbi.cmu.edu),该系统可识别在线期刊文章中图中包含荧光显微镜图像的图块,作为进一步分析此类图像中亚细胞模式的前奏。该系统包含一个预训练的分类器,它使用图像特征为每个单独的图块分配一种类型(类别)。然而,图中各图块的类型通常是相关的,因此我们可以认为一个图块的类别不仅取决于其自身特征,还取决于图中其他图块的类型。
在本文中,我们介绍了一种概率图模型——因子图的使用,以表示图中图像结构信息,并允许对其类型进行更稳健、准确的推断。与单独考虑图块的结果相比,我们取得了显著改进。
用于此处所述实验的代码和数据可从http://murphylab.web.cmu.edu/software获得。