Department of Biostatistics and Programming, Sanofi, Framingham, MA 01701, USA.
Department of Biostatistics, Columbia University, New York, NY 10032, USA.
Bioinformatics. 2020 May 1;36(10):3124-3130. doi: 10.1093/bioinformatics/btaa098.
Single-cell RNA sequencing (scRNA-seq) has enabled the simultaneous transcriptomic profiling of individual cells under different biological conditions. scRNA-seq data have two unique challenges that can affect the sensitivity and specificity of single-cell differential expression analysis: a large proportion of expressed genes with zero or low read counts ('dropout' events) and multimodal data distributions.
We have developed a zero-inflation-adjusted quantile (ZIAQ) algorithm, which is the first method to account for both dropout rates and complex scRNA-seq data distributions in the same model. ZIAQ demonstrates superior performance over several existing methods on simulated scRNA-seq datasets by finding more differentially expressed genes. When ZIAQ was applied to the comparison of neoplastic and non-neoplastic cells from a human glioblastoma dataset, the ranking of biologically relevant genes and pathways showed clear improvement over existing methods.
ZIAQ is implemented in the R language and available at https://github.com/gefeizhang/ZIAQ.
Supplementary data are available at Bioinformatics online.
单细胞 RNA 测序 (scRNA-seq) 使得在不同的生物条件下对单个细胞的转录组进行同时分析成为可能。scRNA-seq 数据有两个独特的挑战,可能会影响单细胞差异表达分析的灵敏度和特异性:高比例表达基因的零或低读数(“缺失”事件)和多峰数据分布。
我们开发了一种零膨胀调整分位数(ZIAQ)算法,这是第一种在同一个模型中同时考虑缺失率和复杂的 scRNA-seq 数据分布的方法。ZIAQ 在模拟的 scRNA-seq 数据集上的表现优于几种现有方法,发现了更多差异表达的基因。当 ZIAQ 应用于从人类脑胶质瘤数据集比较肿瘤和非肿瘤细胞时,与现有方法相比,生物学相关基因和途径的排名明显改善。
ZIAQ 是用 R 语言实现的,可以在 https://github.com/gefeizhang/ZIAQ 上找到。
补充数据可在生物信息学在线获得。