Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
Department of Statistics, Texas A&M University, College Station, TX 77843, USA.
Nucleic Acids Res. 2023 Jul 21;51(13):6578-6592. doi: 10.1093/nar/gkad450.
In this paper, we introduce Gene Knockout Inference (GenKI), a virtual knockout (KO) tool for gene function prediction using single-cell RNA sequencing (scRNA-seq) data in the absence of KO samples when only wild-type (WT) samples are available. Without using any information from real KO samples, GenKI is designed to capture shifting patterns in gene regulation caused by the KO perturbation in an unsupervised manner and provide a robust and scalable framework for gene function studies. To achieve this goal, GenKI adapts a variational graph autoencoder (VGAE) model to learn latent representations of genes and interactions between genes from the input WT scRNA-seq data and a derived single-cell gene regulatory network (scGRN). The virtual KO data is then generated by computationally removing all edges of the KO gene-the gene to be knocked out for functional study-from the scGRN. The differences between WT and virtual KO data are discerned by using their corresponding latent parameters derived from the trained VGAE model. Our simulations show that GenKI accurately approximates the perturbation profiles upon gene KO and outperforms the state-of-the-art under a series of evaluation conditions. Using publicly available scRNA-seq data sets, we demonstrate that GenKI recapitulates discoveries of real-animal KO experiments and accurately predicts cell type-specific functions of KO genes. Thus, GenKI provides an in-silico alternative to KO experiments that may partially replace the need for genetically modified animals or other genetically perturbed systems.
在本文中,我们介绍了 Gene Knockout Inference(GenKI),这是一种用于在仅使用野生型(WT)样本而没有 KO 样本的情况下,通过单细胞 RNA 测序(scRNA-seq)数据进行基因功能预测的虚拟 KO 工具。GenKI 在没有使用任何真实 KO 样本信息的情况下,旨在以无监督的方式捕捉 KO 扰动引起的基因调控变化模式,并为基因功能研究提供强大且可扩展的框架。为了实现这一目标,GenKI 采用了变分图自动编码器(VGAE)模型,从输入的 WT scRNA-seq 数据和衍生的单细胞基因调控网络(scGRN)中学习基因和基因之间相互作用的潜在表示。然后,通过从 scGRN 中计算性地删除 KO 基因(用于功能研究的基因)的所有边缘,生成虚拟 KO 数据。通过使用从训练的 VGAE 模型中得出的相应潜在参数来区分 WT 和虚拟 KO 数据之间的差异。我们的模拟表明,GenKI 可以准确地近似基因 KO 后的扰动谱,并在一系列评估条件下优于最先进的方法。使用公开的 scRNA-seq 数据集,我们证明了 GenKI 重现了真实动物 KO 实验的发现,并准确预测了 KO 基因的细胞类型特异性功能。因此,GenKI 为 KO 实验提供了一种替代方案,可能部分替代对基因修饰动物或其他基因扰动系统的需求。