Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.
Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China.
Bioinformatics. 2019 Aug 15;35(16):2834-2842. doi: 10.1093/bioinformatics/bty1064.
In the post-genomic era, image-based transcriptomics have received huge attention, because the visualization of gene expression distribution is able to reveal spatial and temporal expression pattern, which is significantly important for understanding biological mechanisms. The Berkeley Drosophila Genome Project has collected a large-scale spatial gene expression database for studying Drosophila embryogenesis. Given the expression images, how to annotate them for the study of Drosophila embryonic development is the next urgent task. In order to speed up the labor-intensive labeling work, automatic tools are highly desired. However, conventional image annotation tools are not applicable here, because the labeling is at the gene-level rather than the image-level, where each gene is represented by a bag of multiple related images, showing a multi-instance phenomenon, and the image quality varies by image orientations and experiment batches. Moreover, different local regions of an image correspond to different CV annotation terms, i.e. an image has multiple labels. Designing an accurate annotation tool in such a multi-instance multi-label scenario is a very challenging task.
To address these challenges, we develop a new annotator for the fruit fly embryonic images, called AnnoFly. Driven by an attention-enhanced RNN model, it can weight images of different qualities, so as to focus on the most informative image patterns. We assess the new model on three standard datasets. The experimental results reveal that the attention-based model provides a transparent approach for identifying the important images for labeling, and it substantially enhances the accuracy compared with the existing annotation methods, including both single-instance and multi-instance learning methods.
http://www.csbio.sjtu.edu.cn/bioinf/annofly/.
Supplementary data are available at Bioinformatics online.
在后基因组时代,基于图像的转录组学受到了极大的关注,因为基因表达分布的可视化能够揭示时空表达模式,这对于理解生物机制非常重要。伯克利果蝇基因组计划(Berkeley Drosophila Genome Project)收集了大规模的空间基因表达数据库,用于研究果蝇胚胎发生。给定表达图像,如何对其进行注释以研究果蝇胚胎发育是下一个紧迫的任务。为了加快劳动密集型的标记工作,非常需要自动工具。然而,传统的图像注释工具在这里并不适用,因为标记是在基因级别而不是图像级别,其中每个基因都由多个相关图像组成的一个袋子表示,表现出多实例现象,并且图像质量因图像方向和实验批次而异。此外,图像的不同局部区域对应于不同的 CV 注释术语,即图像具有多个标签。在这种多实例多标签的场景中设计一个准确的注释工具是一个非常具有挑战性的任务。
为了解决这些挑战,我们为果蝇胚胎图像开发了一个新的注释器,称为 AnnoFly。受注意力增强 RNN 模型的驱动,它可以对不同质量的图像进行加权,以便专注于最具信息量的图像模式。我们在三个标准数据集上评估了新模型。实验结果表明,基于注意力的模型为识别标记的重要图像提供了一种透明的方法,与现有的注释方法(包括单实例和多实例学习方法)相比,它显著提高了准确性。
http://www.csbio.sjtu.edu.cn/bioinf/annofly/。
补充数据可在生物信息学在线获得。