Department of Automation, Xiamen University, Xiamen, Fujian, China.
National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China.
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae148.
Single-cell RNA sequencing (scRNA-seq) facilitates the study of cell type heterogeneity and the construction of cell atlas. However, due to its limitations, many genes may be detected to have zero expressions, i.e. dropout events, leading to bias in downstream analyses and hindering the identification and characterization of cell types and cell functions. Although many imputation methods have been developed, their performances are generally lower than expected across different kinds and dimensions of data and application scenarios. Therefore, developing an accurate and robust single-cell gene expression data imputation method is still essential. Considering to maintain the original cell-cell and gene-gene correlations and leverage bulk RNA sequencing (bulk RNA-seq) data information, we propose scINRB, a single-cell gene expression imputation method with network regularization and bulk RNA-seq data. scINRB adopts network-regularized non-negative matrix factorization to ensure that the imputed data maintains the cell-cell and gene-gene similarities and also approaches the gene average expression calculated from bulk RNA-seq data. To evaluate the performance, we test scINRB on simulated and experimental datasets and compare it with other commonly used imputation methods. The results show that scINRB recovers gene expression accurately even in the case of high dropout rates and dimensions, preserves cell-cell and gene-gene similarities and improves various downstream analyses including visualization, clustering and trajectory inference.
单细胞 RNA 测序 (scRNA-seq) 有助于研究细胞类型异质性和构建细胞图谱。然而,由于其局限性,许多基因可能被检测到具有零表达,即缺失事件,这导致下游分析存在偏差,并阻碍了细胞类型和细胞功能的识别和表征。尽管已经开发了许多插补方法,但它们在不同类型和维度的数据和应用场景中的性能通常低于预期。因此,开发一种准确和稳健的单细胞基因表达数据插补方法仍然至关重要。考虑到保持原始的细胞间和基因间相关性,并利用批量 RNA 测序 (bulk RNA-seq) 数据信息,我们提出了 scINRB,一种具有网络正则化和批量 RNA-seq 数据的单细胞基因表达插补方法。scINRB 采用网络正则化非负矩阵分解来确保插补数据保持细胞间和基因间的相似性,并接近从批量 RNA-seq 数据计算得出的基因平均表达。为了评估性能,我们在模拟和实验数据集上测试了 scINRB,并将其与其他常用的插补方法进行了比较。结果表明,scINRB 即使在高缺失率和维度的情况下也能准确地恢复基因表达,保留细胞间和基因间的相似性,并改进了各种下游分析,包括可视化、聚类和轨迹推断。