Zhang Wenlu, Li Rongjian, Zeng Tao, Sun Qian, Kumar Sudhir, Ye Jieping, Ji Shuiwang
Department of Computer Science, Old Dominion University, Norfolk, VA, 23529.
School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99163.
IEEE Trans Big Data. 2020 Jun;6(2):322-333. doi: 10.1109/tbdata.2016.2573280. Epub 2016 May 30.
A central theme in learning from image data is to develop appropriate representations for the specific task at hand. Thus, a practical challenge is to determine what features are appropriate for specific tasks. For example, in the study of gene expression patterns in , texture features were particularly effective for determining the developmental stages from in situ hybridization images. Such image representation is however not suitable for controlled vocabulary term annotation. Here, we developed feature extraction methods to generate hierarchical representations for ISH images. Our approach is based on the deep convolutional neural networks that can act on image pixels directly. To make the extracted features generic, the models were trained using a natural image set with millions of labeled examples. These models were transferred to the ISH image domain. To account for the differences between the source and target domains, we proposed a partial transfer learning scheme in which only part of the source model is transferred. We employed multi-task learning method to fine-tune the pre-trained models with labeled ISH images. Results showed that feature representations computed by deep models based on transfer and multi-task learning significantly outperformed other methods for annotating gene expression patterns at different stage ranges.
从图像数据中学习的一个核心主题是为手头的特定任务开发合适的表示形式。因此,一个实际的挑战是确定哪些特征适用于特定任务。例如,在研究[具体研究对象]中的基因表达模式时,纹理特征对于从原位杂交图像确定发育阶段特别有效。然而,这种图像表示不适用于受控词汇术语注释。在这里,我们开发了特征提取方法来生成原位杂交图像的分层表示。我们的方法基于可以直接作用于图像像素的深度卷积神经网络。为了使提取的特征具有通用性,使用包含数百万个标记示例的自然图像集对模型进行训练。然后将这些模型转移到原位杂交图像领域。为了解决源域和目标域之间的差异,我们提出了一种部分迁移学习方案,其中只转移源模型的一部分。我们采用多任务学习方法,用标记的原位杂交图像对预训练模型进行微调。结果表明,基于迁移和多任务学习的深度模型计算的特征表示在注释不同阶段范围内的基因表达模式方面明显优于其他方法。