Huang Zimo, Wang Jun, Lu Xudong, Mohd Zain Azlan, Yu Guoxian
MEng student at School of Software, Shandong University, China.
Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China.
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad040.
Single-cell RNA sequencing (scRNA-seq) data are typically with a large number of missing values, which often results in the loss of critical gene signaling information and seriously limit the downstream analysis. Deep learning-based imputation methods often can better handle scRNA-seq data than shallow ones, but most of them do not consider the inherent relations between genes, and the expression of a gene is often regulated by other genes. Therefore, it is essential to impute scRNA-seq data by considering the regional gene-to-gene relations. We propose a novel model (named scGGAN) to impute scRNA-seq data that learns the gene-to-gene relations by Graph Convolutional Networks (GCN) and global scRNA-seq data distribution by Generative Adversarial Networks (GAN). scGGAN first leverages single-cell and bulk genomics data to explore inherent relations between genes and builds a more compact gene relation network to jointly capture the homogeneous and heterogeneous information. Then, it constructs a GCN-based GAN model to integrate the scRNA-seq, gene sequencing data and gene relation network for generating scRNA-seq data, and trains the model through adversarial learning. Finally, it utilizes data generated by the trained GCN-based GAN model to impute scRNA-seq data. Experiments on simulated and real scRNA-seq datasets show that scGGAN can effectively identify dropout events, recover the biologically meaningful expressions, determine subcellular states and types, improve the differential expression analysis and temporal dynamics analysis. Ablation experiments confirm that both the gene relation network and gene sequence data help the imputation of scRNA-seq data.
单细胞RNA测序(scRNA-seq)数据通常存在大量缺失值,这常常导致关键基因信号信息的丢失,并严重限制下游分析。基于深度学习的插补方法通常比浅层方法能更好地处理scRNA-seq数据,但其中大多数没有考虑基因之间的内在关系,而一个基因的表达往往受其他基因调控。因此,考虑区域基因间关系对scRNA-seq数据进行插补至关重要。我们提出了一种新颖的模型(名为scGGAN)来插补scRNA-seq数据,该模型通过图卷积网络(GCN)学习基因间关系,并通过生成对抗网络(GAN)学习全局scRNA-seq数据分布。scGGAN首先利用单细胞和批量基因组学数据探索基因间的内在关系,并构建一个更紧凑的基因关系网络以共同捕获同质和异质信息。然后,它构建一个基于GCN的GAN模型,将scRNA-seq、基因测序数据和基因关系网络整合起来以生成scRNA-seq数据,并通过对抗学习训练该模型。最后,它利用基于GCN的训练好的GAN模型生成的数据来插补scRNA-seq数据。在模拟和真实scRNA-seq数据集上的实验表明,scGGAN可以有效地识别缺失事件,恢复具有生物学意义的表达,确定亚细胞状态和类型,改善差异表达分析和时间动态分析。消融实验证实,基因关系网络和基因序列数据都有助于scRNA-seq数据的插补。