College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China.
Department of Artificial Intelligence, Faculty of Computer Science, Campus de Montegancedo, Polytechnical University of Madrid, Boadilla del Monte, 28660 Madrid, Spain.
Int J Mol Sci. 2022 Feb 14;23(4):2082. doi: 10.3390/ijms23042082.
There is a strong need to eliminate batch-specific differences when integrating single-cell RNA-sequencing (scRNA-seq) datasets generated under different experimental conditions for downstream task analysis. Existing batch correction methods usually transform different batches of cells into one preselected "anchor" batch or a low-dimensional embedding space, and cannot take full advantage of useful information from multiple sources. We present a novel framework, called IMGG, i.e., integrating multiple single-cell datasets through connected graphs and generative adversarial networks (GAN) to eliminate nonbiological differences between different batches. Compared with current methods, IMGG shows excellent performance on a variety of evaluation metrics, and the IMGG-corrected gene expression data incorporate features from multiple batches, allowing for downstream tasks such as differential gene expression analysis.
当需要将在不同实验条件下生成的单细胞 RNA 测序 (scRNA-seq) 数据集整合进行下游任务分析时,消除批次特异性差异非常重要。现有的批次校正方法通常将不同批次的细胞转化为一个预先选择的“锚定”批次或低维嵌入空间,并且无法充分利用来自多个来源的有用信息。我们提出了一个名为 IMGG 的新框架,即通过连接图和生成对抗网络 (GAN) 整合多个单细胞数据集,以消除不同批次之间的非生物学差异。与当前方法相比,IMGG 在各种评估指标上都表现出优异的性能,并且 IMGG 校正后的基因表达数据整合了多个批次的特征,允许进行下游任务,如差异基因表达分析。