Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana 61801, IL, USA.
Google Research, 1600 Amphitheatre Parkway Mountain View, CA 94043.
Bioinformatics. 2020 Dec 30;36(Suppl_2):i875-i883. doi: 10.1093/bioinformatics/btaa819.
Advances in automation and imaging have made it possible to capture a large image dataset that spans multiple experimental batches of data. However, accurate biological comparison across the batches is challenged by batch-to-batch variation (i.e. batch effect) due to uncontrollable experimental noise (e.g. varying stain intensity or cell density). Previous approaches to minimize the batch effect have commonly focused on normalizing the low-dimensional image measurements such as an embedding generated by a neural network. However, normalization of the embedding could suffer from over-correction and alter true biological features (e.g. cell size) due to our limited ability to interpret the effect of the normalization on the embedding space. Although techniques like flat-field correction can be applied to normalize the image values directly, they are limited transformations that handle only simple artifacts due to batch effect.
We present a neural network-based batch equalization method that can transfer images from one batch to another while preserving the biological phenotype. The equalization method is trained as a generative adversarial network (GAN), using the StarGAN architecture that has shown considerable ability in style transfer. After incorporating new objectives that disentangle batch effect from biological features, we show that the equalized images have less batch information and preserve the biological information. We also demonstrate that the same model training parameters can generalize to two dramatically different types of cells, indicating this approach could be broadly applicable.
https://github.com/tensorflow/gan/tree/master/tensorflow_gan/examples/stargan.
Supplementary data are available at Bioinformatics online.
自动化和成像技术的进步使得能够获取跨越多个实验批次数据的大型图像数据集。然而,由于不可控的实验噪声(例如,变化的染色强度或细胞密度),批次间的变化(即批次效应)使得批次间的准确生物比较受到挑战。以前减少批次效应的方法通常集中在对低维图像测量(例如神经网络生成的嵌入)进行归一化上。然而,由于我们解释归一化对嵌入空间的影响的能力有限,嵌入的归一化可能会受到过度校正的影响,并改变真实的生物特征(例如细胞大小)。尽管可以应用像平场校正这样的技术直接对图像值进行归一化,但它们是有限的变换,只能处理由于批次效应引起的简单伪影。
我们提出了一种基于神经网络的批量均衡方法,该方法可以在保留生物表型的同时将图像从一个批次转移到另一个批次。均衡方法被训练为生成对抗网络(GAN),使用 StarGAN 架构,该架构在风格转换方面表现出了相当大的能力。在纳入将批次效应与生物特征解耦的新目标后,我们表明均衡后的图像具有较少的批次信息并保留了生物信息。我们还表明,相同的模型训练参数可以推广到两种截然不同的细胞类型,这表明这种方法可能具有广泛的适用性。
https://github.com/tensorflow/gan/tree/master/tensorflow_gan/examples/stargan。
补充数据可在生物信息学在线获得。