School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China; Key Laboratory of Complex Systems Modeling and Simulation Ministry of Education, Ministry of Education, China.
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China; School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, 310023, China.
Comput Biol Med. 2023 Sep;164:107263. doi: 10.1016/j.compbiomed.2023.107263. Epub 2023 Jul 23.
Single-cell RNA-sequencing (scRNA-seq) technology has revolutionized the study of cell heterogeneity and biological interpretation at the single-cell level. However, the dropout events commonly present in scRNA-seq data can markedly reduce the reliability of downstream analysis. Existing imputation methods often overlook the discrepancy between the established cell relationship from dropout noisy data and reality, which limits their performances due to the learned untrustworthy cell representations.
Here, we propose a novel approach called the CL-Impute (Contrastive Learning-based Impute) model for estimating missing genes without relying on preconstructed cell relationships. CL-Impute utilizes contrastive learning and a self-attention network to address this challenge. Specifically, the proposed CL-Impute model leverages contrastive learning to learn cell representations from the self-perspective of dropout events, whereas the self-attention network captures cell relationships from the global-perspective.
Experimental results on four benchmark datasets, including quantitative assessment, cell clustering, gene identification, and trajectory inference, demonstrate the superior performance of CL-Impute compared with that of existing state-of-the-art imputation methods. Furthermore, our experiment reveals that combining contrastive learning and masking cell augmentation enables the model to learn actual latent features from noisy data with a high rate of dropout events, enhancing the reliability of imputed values.
CL-Impute is a novel contrastive learning-based method to impute scRNA-seq data in the context of high dropout rate. The source code of CL-Impute is available at https://github.com/yuchen21-web/Imputation-for-scRNA-seq.
单细胞 RNA 测序(scRNA-seq)技术彻底改变了单细胞水平的细胞异质性和生物学解释的研究。然而,scRNA-seq 数据中常见的缺失事件会显著降低下游分析的可靠性。现有的插补方法通常忽略了由缺失噪声数据建立的细胞关系与现实之间的差异,这限制了它们的性能,因为学习到的不可信的细胞表示。
在这里,我们提出了一种称为 CL-Impute(基于对比学习的插补)的新方法,用于在不依赖预先构建的细胞关系的情况下估计缺失的基因。CL-Impute 利用对比学习和自注意力网络来解决这个问题。具体来说,所提出的 CL-Impute 模型利用对比学习从缺失事件的自视角学习细胞表示,而自注意力网络从全局视角捕捉细胞关系。
在包括定量评估、细胞聚类、基因识别和轨迹推断在内的四个基准数据集上的实验结果表明,CL-Impute 与现有最先进的插补方法相比具有优越的性能。此外,我们的实验表明,结合对比学习和掩蔽细胞增强可以使模型以高缺失率的噪声数据中学习实际的潜在特征,从而提高插补值的可靠性。
CL-Impute 是一种在高缺失率背景下对 scRNA-seq 数据进行插补的新的基于对比学习的方法。CL-Impute 的源代码可在 https://github.com/yuchen21-web/Imputation-for-scRNA-seq 上获得。