Visualization and Intelligent Systems Laboratory, University of California, Riverside, CA 92521, USA.
Department of Bioengineering, University of California, Riverside, CA 92521, USA.
Sensors (Basel). 2021 Dec 29;22(1):206. doi: 10.3390/s22010206.
Frequently, neural network training involving biological images suffers from a lack of data, resulting in inefficient network learning. This issue stems from limitations in terms of time, resources, and difficulty in cellular experimentation and data collection. For example, when performing experimental analysis, it may be necessary for the researcher to use most of their data for testing, as opposed to model training. Therefore, the goal of this paper is to perform dataset augmentation using generative adversarial networks (GAN) to increase the classification accuracy of deep convolutional neural networks (CNN) trained on induced pluripotent stem cell microscopy images. The main challenges are: 1. modeling complex data using GAN and 2. training neural networks on augmented datasets that contain generated data. To address these challenges, a temporally constrained, hierarchical classification scheme that exploits domain knowledge is employed for model learning. First, image patches of cell colonies from gray-scale microscopy images are generated using GAN, and then these images are added to the real dataset and used to address class imbalances at multiple stages of training. Overall, a 2% increase in both true positive rate and F1-score is observed using this method as compared to a straightforward, imbalanced classification network, with some greater improvements on a classwise basis. This work demonstrates that synergistic model design involving domain knowledge is key for biological image analysis and improves model learning in high-throughput scenarios.
通常,涉及生物图像的神经网络训练会因数据不足而导致网络学习效率低下。这个问题源于在细胞实验和数据收集方面时间、资源和难度的限制。例如,在进行实验分析时,研究人员可能需要将大部分数据用于测试,而不是模型训练。因此,本文的目的是使用生成对抗网络(GAN)进行数据集扩充,以提高在诱导多能干细胞显微镜图像上训练的深度卷积神经网络(CNN)的分类准确性。主要挑战有:1. 使用 GAN 对复杂数据进行建模,2. 在包含生成数据的扩充数据上训练神经网络。为了解决这些挑战,我们采用了一种利用领域知识的时间受限、分层分类方案进行模型学习。首先,使用 GAN 生成灰度显微镜图像中细胞集落的图像块,然后将这些图像添加到真实数据集,并用于在训练的多个阶段解决类别不平衡问题。总体而言,与直接的不平衡分类网络相比,该方法可将真阳性率和 F1 得分分别提高 2%,在类别基础上还会有一些更大的改进。这项工作表明,涉及领域知识的协同模型设计是生物图像分析的关键,并可提高高通量场景下的模型学习能力。