Risso Davide, Ngai John, Speed Terence P, Dudoit Sandrine
Department of Statistics, University of California, Berkeley, Berkeley, California, USA.
1] Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA. [2] Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, California, USA. [3] Functional Genomics Laboratory, University of California, Berkeley, Berkeley, California, USA.
Nat Biotechnol. 2014 Sep;32(9):896-902. doi: 10.1038/nbt.2931. Epub 2014 Aug 24.
Normalization of RNA-sequencing (RNA-seq) data has proven essential to ensure accurate inference of expression levels. Here, we show that usual normalization approaches mostly account for sequencing depth and fail to correct for library preparation and other more complex unwanted technical effects. We evaluate the performance of the External RNA Control Consortium (ERCC) spike-in controls and investigate the possibility of using them directly for normalization. We show that the spike-ins are not reliable enough to be used in standard global-scaling or regression-based normalization procedures. We propose a normalization strategy, called remove unwanted variation (RUV), that adjusts for nuisance technical effects by performing factor analysis on suitable sets of control genes (e.g., ERCC spike-ins) or samples (e.g., replicate libraries). Our approach leads to more accurate estimates of expression fold-changes and tests of differential expression compared to state-of-the-art normalization methods. In particular, RUV promises to be valuable for large collaborative projects involving multiple laboratories, technicians, and/or sequencing platforms.
RNA测序(RNA-seq)数据的标准化已被证明对于确保准确推断表达水平至关重要。在这里,我们表明,通常的标准化方法大多只考虑了测序深度,而未能校正文库制备及其他更复杂的不必要技术效应。我们评估了外部RNA对照联盟(ERCC)掺入对照的性能,并研究了直接将其用于标准化的可能性。我们表明,掺入对照不够可靠,无法用于标准的全局缩放或基于回归的标准化程序。我们提出了一种称为去除不必要变异(RUV)的标准化策略,该策略通过对合适的对照基因集(例如,ERCC掺入对照)或样本(例如,重复文库)进行因子分析来调整干扰技术效应。与最先进的标准化方法相比,我们的方法能够更准确地估计表达倍数变化并进行差异表达检验。特别是,RUV对于涉及多个实验室、技术人员和/或测序平台的大型合作项目有望具有重要价值。