Institute for Computational and Mathematical Engineering, Stanford University, Stanford, 94305 CA, USA.
Department of Radiation Oncology, Stanford University, Stanford, 94305 CA, USA.
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae031.
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool to gain biological insights at the cellular level. However, due to technical limitations of the existing sequencing technologies, low gene expression values are often omitted, leading to inaccurate gene counts. Existing methods, including advanced deep learning techniques, struggle to reliably impute gene expressions due to a lack of mechanisms that explicitly consider the underlying biological knowledge of the system. In reality, it has long been recognized that gene-gene interactions may serve as reflective indicators of underlying biology processes, presenting discriminative signatures of the cells. A genomic data analysis framework that is capable of leveraging the underlying gene-gene interactions is thus highly desirable and could allow for more reliable identification of distinctive patterns of the genomic data through extraction and integration of intricate biological characteristics of the genomic data. Here we tackle the problem in two steps to exploit the gene-gene interactions of the system. We first reposition the genes into a 2D grid such that their spatial configuration reflects their interactive relationships. To alleviate the need for labeled ground truth gene expression datasets, a self-supervised 2D convolutional neural network is employed to extract the contextual features of the interactions from the spatially configured genes and impute the omitted values. Extensive experiments with both simulated and experimental scRNA-seq datasets are carried out to demonstrate the superior performance of the proposed strategy against the existing imputation methods.
单细胞 RNA 测序 (scRNA-seq) 已成为在细胞水平上获得生物学见解的强大工具。然而,由于现有测序技术的技术限制,低表达值的基因通常会被忽略,导致基因计数不准确。现有的方法,包括先进的深度学习技术,由于缺乏明确考虑系统基础生物学知识的机制,因此难以可靠地推断基因表达。实际上,人们早就认识到基因-基因相互作用可以作为潜在生物学过程的反映指标,为细胞提供有区别的特征。因此,能够利用潜在基因-基因相互作用的基因组数据分析框架是非常可取的,并且可以通过提取和整合基因组数据的复杂生物学特征,更可靠地识别基因组数据的独特模式。在这里,我们分两步解决这个问题,以利用系统的基因-基因相互作用。我们首先将基因重新定位到二维网格中,使得它们的空间配置反映它们的相互关系。为了减轻对标记的真实基因表达数据集的需求,我们使用了一个自监督的二维卷积神经网络,从空间配置的基因中提取相互作用的上下文特征,并推断缺失的值。我们使用模拟和实验 scRNA-seq 数据集进行了广泛的实验,以证明所提出的策略相对于现有推断方法的优越性能。