Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain.
PLoS One. 2023 Feb 3;18(2):e0281315. doi: 10.1371/journal.pone.0281315. eCollection 2023.
Recent progress in Single-Cell Genomics has produced different library protocols and techniques for molecular profiling. We formulate a unifying, data-driven, integrative, and predictive methodology for different libraries, samples, and paired-unpaired data modalities. Our design of scAEGAN includes an autoencoder (AE) network integrated with adversarial learning by a cycleGAN (cGAN) network. The AE learns a low-dimensional embedding of each condition, whereas the cGAN learns a non-linear mapping between the AE representations. We evaluate scAEGAN using simulated data and real scRNA-seq datasets, different library preparations (Fluidigm C1, CelSeq, CelSeq2, SmartSeq), and several data modalities as paired scRNA-seq and scATAC-seq. The scAEGAN outperforms Seurat3 in library integration, is more robust against data sparsity, and beats Seurat 4 in integrating paired data from the same cell. Furthermore, in predicting one data modality from another, scAEGAN outperforms Babel. We conclude that scAEGAN surpasses current state-of-the-art methods and unifies integration and prediction challenges.
单细胞基因组学的最新进展产生了不同的分子谱文库方案和技术。我们为不同的文库、样本和配对/非配对数据模态制定了一个统一的、数据驱动的、综合的和可预测的方法。我们的 scAEGAN 设计包括一个自动编码器 (AE) 网络,该网络与循环生成对抗网络 (cGAN) 网络的对抗学习集成在一起。AE 学习每个条件的低维嵌入,而 cGAN 学习 AE 表示之间的非线性映射。我们使用模拟数据和真实的 scRNA-seq 数据集、不同的文库制备方法(Fluidigm C1、CelSeq、CelSeq2、SmartSeq)以及几种数据模态(如配对的 scRNA-seq 和 scATAC-seq)来评估 scAEGAN。scAEGAN 在文库整合方面优于 Seurat3,对数据稀疏性更具鲁棒性,并且在整合来自同一细胞的配对数据方面优于 Seurat 4。此外,在从另一种数据模态预测一种数据模态方面,scAEGAN 优于 Babel。我们得出结论,scAEGAN 超越了当前的最先进方法,并统一了整合和预测方面的挑战。