IEEE J Biomed Health Inform. 2024 Jun;28(6):3772-3780. doi: 10.1109/JBHI.2024.3383921. Epub 2024 Jun 6.
The advent of single-cell RNA sequencing (scRNA-seq) technology has revolutionized gene expression studies at the single-cell level. However, the presence of technical noise and data sparsity in scRNA-seq often undermines the accuracy of subsequent analyses. Existing methods for denoising and imputing scRNA-seq data often rely on stringent assumptions about data distribution, limiting the effectiveness of data recovery. In this study, we propose the scDMAE model for denoising and recovery of scRNA-seq data. First, the model fuses gene expression features and topological features to discern the primary expression patterns of genes in cells. Then, an autoencoder with a masking strategy is used to model dropout events and separate potential noise in the data. Finally, the model incorporates the original raw data to recover the true biological expression value. By conducting experiments on various types of scRNA-Seq datasets, scDMAE demonstrates superior performance compared to other comparative methods based on six distinct evaluation metrics in downstream analysis. The scDMAE method can accurately cluster similar cell populations, identify differential genes and infer cell trajectories.
单细胞 RNA 测序 (scRNA-seq) 技术的出现彻底改变了单细胞水平的基因表达研究。然而,scRNA-seq 中存在的技术噪声和数据稀疏性往往会降低后续分析的准确性。现有的 scRNA-seq 数据去噪和插补方法通常依赖于对数据分布的严格假设,限制了数据恢复的有效性。在这项研究中,我们提出了用于 scRNA-seq 数据去噪和恢复的 scDMAE 模型。首先,该模型融合了基因表达特征和拓扑特征,以辨别细胞中基因的主要表达模式。然后,使用具有掩蔽策略的自动编码器来模拟数据中的缺失事件并分离潜在的噪声。最后,该模型结合原始的原始数据来恢复真实的生物学表达值。通过在各种类型的 scRNA-Seq 数据集上进行实验,scDMAE 在下游分析中基于六个不同的评估指标,与其他比较方法相比表现出优越的性能。scDMAE 方法可以准确地聚类相似的细胞群体,识别差异基因并推断细胞轨迹。