Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA.
Genome Biol. 2019 May 6;20(1):88. doi: 10.1186/s13059-019-1681-8.
Single-cell RNA-seq data contain a large proportion of zeros for expressed genes. Such dropout events present a fundamental challenge for various types of data analyses. Here, we describe the SCRABBLE algorithm to address this problem. SCRABBLE leverages bulk data as a constraint and reduces unwanted bias towards expressed genes during imputation. Using both simulation and several types of experimental data, we demonstrate that SCRABBLE outperforms the existing methods in recovering dropout events, capturing true distribution of gene expression across cells, and preserving gene-gene relationship and cell-cell relationship in the data.
单细胞 RNA-seq 数据中包含大量表达基因的零值。这种缺失事件给各种类型的数据分析带来了根本性的挑战。在这里,我们描述了 SCRABBLE 算法来解决这个问题。SCRABBLE 利用批量数据作为约束,并在插补过程中减少对表达基因的不必要偏见。通过模拟和几种类型的实验数据,我们证明 SCRABBLE 在恢复缺失事件、捕获细胞间真实基因表达分布以及保留数据中基因-基因关系和细胞-细胞关系方面优于现有方法。