Risso Davide, Perraudeau Fanny, Gribkova Svetlana, Dudoit Sandrine, Vert Jean-Philippe
Division of Biostatistics and Epidemiology, Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, 10065, USA.
Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, 94720, USA.
Nat Commun. 2018 Jan 18;9(1):284. doi: 10.1038/s41467-017-02554-5.
Single-cell RNA-sequencing (scRNA-seq) is a powerful high-throughput technique that enables researchers to measure genome-wide transcription levels at the resolution of single cells. Because of the low amount of RNA present in a single cell, some genes may fail to be detected even though they are expressed; these genes are usually referred to as dropouts. Here, we present a general and flexible zero-inflated negative binomial model (ZINB-WaVE), which leads to low-dimensional representations of the data that account for zero inflation (dropouts), over-dispersion, and the count nature of the data. We demonstrate, with simulated and real data, that the model and its associated estimation procedure are able to give a more stable and accurate low-dimensional representation of the data than principal component analysis (PCA) and zero-inflated factor analysis (ZIFA), without the need for a preliminary normalization step.
单细胞RNA测序(scRNA-seq)是一种强大的高通量技术,它使研究人员能够在单细胞分辨率下测量全基因组转录水平。由于单个细胞中存在的RNA量较少,即使某些基因被表达,它们也可能无法被检测到;这些基因通常被称为缺失值。在这里,我们提出了一种通用且灵活的零膨胀负二项式模型(ZINB-WaVE),该模型可以得到数据的低维表示,这种表示考虑了零膨胀(缺失值)、过度离散以及数据的计数性质。我们通过模拟数据和真实数据证明,与主成分分析(PCA)和零膨胀因子分析(ZIFA)相比,该模型及其相关估计程序能够给出更稳定、准确的数据低维表示,且无需进行初步归一化步骤。