Peixoto Lucia, Risso Davide, Poplawski Shane G, Wimmer Mathieu E, Speed Terence P, Wood Marcelo A, Abel Ted
Department of Biology, University of Pennsylvania, Smilow Center for Translational Research, Room 10-170, Building 421, 3400 Civic Center Boulevard, Philadelphia, PA 19104-6168, USA.
Division of Biostatistics, School of Public Health, University of California, Berkeley, 344 Li Ka Shing Center, #3370, Berkeley, CA 94720-3370, USA.
Nucleic Acids Res. 2015 Sep 18;43(16):7664-74. doi: 10.1093/nar/gkv736. Epub 2015 Jul 21.
The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, challenges remain in RNA-seq data analysis. One often-overlooked aspect is normalization. Despite the fact that a variety of factors or 'batch effects' can contribute unwanted variation to the data, commonly used RNA-seq normalization methods only correct for sequencing depth. The study of gene expression is particularly problematic when it is influenced simultaneously by a variety of biological factors in addition to the one of interest. Using examples from experimental neuroscience, we show that batch effects can dominate the signal of interest; and that the choice of normalization method affects the power and reproducibility of the results. While commonly used global normalization methods are not able to adequately normalize the data, more recently developed RNA-seq normalization can. We focus on one particular method, RUVSeq and show that it is able to increase power and biological insight of the results. Finally, we provide a tutorial outlining the implementation of RUVSeq normalization that is applicable to a broad range of studies as well as meta-analysis of publicly available data.
全转录组测序(RNA测序)已成为全基因组基因表达测量的首选方法。尽管其应用广泛,但RNA测序数据分析仍面临挑战。一个常被忽视的方面是标准化。尽管多种因素或“批次效应”会给数据带来不必要的变异,但常用的RNA测序标准化方法仅校正测序深度。当基因表达研究除了受感兴趣的因素影响外,还同时受到多种生物学因素影响时,问题尤为突出。通过实验神经科学的实例,我们表明批次效应可能主导感兴趣的信号;并且标准化方法的选择会影响结果的效力和可重复性。虽然常用的全局标准化方法无法充分标准化数据,但最近开发的RNA测序标准化方法可以做到。我们重点介绍一种特定方法RUVSeq,并表明它能够提高结果的效力和生物学见解。最后,我们提供了一个教程,概述了RUVSeq标准化的实施方法,该方法适用于广泛的研究以及对公开可用数据的荟萃分析。