Program in Applied Mathematics, Yale University, New Haven, CT, 06511, USA.
Interdepartmental Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA.
Nat Commun. 2022 Jan 11;13(1):192. doi: 10.1038/s41467-021-27729-z.
A key challenge in analyzing single cell RNA-sequencing data is the large number of false zeros, where genes actually expressed in a given cell are incorrectly measured as unexpressed. We present a method based on low-rank matrix approximation which imputes these values while preserving biologically non-expressed genes (true biological zeros) at zero expression levels. We provide theoretical justification for this denoising approach and demonstrate its advantages relative to other methods on simulated and biological datasets.
分析单细胞 RNA 测序数据的一个主要挑战是大量的假零值,即实际上在给定细胞中表达的基因被错误地测量为未表达。我们提出了一种基于低秩矩阵逼近的方法,该方法在将这些值进行插补的同时,将零表达水平的生物上未表达的基因(真实的生物零值)保留在零水平。我们为这种去噪方法提供了理论依据,并在模拟和生物数据集上证明了它相对于其他方法的优势。