scIDPMs:使用扩散概率模型的单细胞RNA测序插补

scIDPMs: Single-Cell RNA-Seq Imputation Using Diffusion Probabilistic Models.

作者信息

Zhang Zhiqiang, Liu Lin

出版信息

IEEE J Biomed Health Inform. 2025 Apr;29(4):3057-3068. doi: 10.1109/JBHI.2024.3430554. Epub 2025 Apr 4.

Abstract

Single-cell RNA sequencing (scRNA-seq) technology has revolutionized biological research by enabling the sequencing of mRNA in individual cells, thereby providing valuable insights into cellular gene expression and functions. However, scRNA-seq data often contain false zero values known as dropout events, which can obscure true gene expression levels and compromise downstream analysis accuracy. To address this issue, several computational approaches have been proposed for imputing missing gene expression values. Nevertheless, these methods struggle to capture dropout value distributions due to the sparsity of scRNA-seq data and complex gene expression patterns. In this study, we present a novel method called scIDPMs that utilizes conditional diffusion probabilistic models to impute scRNA-seq data. Firstly, scIDPMs identifies dropout sites based on gene expression characteristics and subsequently infers the missing values by considering available gene expression information. To effectively capture global gene expression features, scIDPMs employs a deep neural network with an attention mechanism to optimize the imputation process. We evaluated the performance of scIDPMs using simulated and real scRNA-seq datasets and compared it with ten other imputation methods. The results indicate that scIDPMs outperform other methods in restoring biologically meaningful gene expression values and improving downstream analysis.

摘要

单细胞RNA测序(scRNA-seq)技术通过对单个细胞中的mRNA进行测序,彻底改变了生物学研究,从而为细胞基因表达和功能提供了有价值的见解。然而,scRNA-seq数据通常包含称为缺失事件的虚假零值,这可能会掩盖真实的基因表达水平并影响下游分析的准确性。为了解决这个问题,已经提出了几种计算方法来估算缺失的基因表达值。然而,由于scRNA-seq数据的稀疏性和复杂的基因表达模式,这些方法难以捕捉缺失值的分布。在本研究中,我们提出了一种名为scIDPMs的新方法,该方法利用条件扩散概率模型来估算scRNA-seq数据。首先,scIDPMs根据基因表达特征识别缺失位点,然后通过考虑可用的基因表达信息来推断缺失值。为了有效地捕捉全局基因表达特征,scIDPMs采用了一种带有注意力机制的深度神经网络来优化估算过程。我们使用模拟和真实的scRNA-seq数据集评估了scIDPMs的性能,并将其与其他十种估算方法进行了比较。结果表明,scIDPMs在恢复生物学上有意义的基因表达值和改进下游分析方面优于其他方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索