Zhou Xiaobei, Lindsay Helen, Robinson Mark D
Institute of Molecular Life Sciences, University of Zurich, CH-8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland.
Institute of Molecular Life Sciences, University of Zurich, CH-8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
Nucleic Acids Res. 2014 Jun;42(11):e91. doi: 10.1093/nar/gku310. Epub 2014 Apr 20.
A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of 'sharing of information' across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/.
一种用于比较RNA测序数据(重复)条件下基因表达水平的常用方法依赖于对映射到感兴趣特征的 reads 进行计数。在这些基于计数的方法中,现在存在许多灵活且先进的统计方法,并且能够针对协变量(例如批次效应)进行调整。通常,这些方法包括某种跨特征的“信息共享”,以改善小样本中的推断。在统计功效和抵御异常值之间实现适当的权衡非常重要。在这里,我们研究了基于计数的差异表达分析现有方法的稳健性,并提出了一种基于观察权重的新策略,该策略可在现有框架内使用。结果表明,异常值可能会对差异分析产生全局影响。我们用反映真实数据集属性(例如离散度 - 均值趋势)的真实数据和模拟数据证明了我们新方法的有效性,并开发了一个可扩展框架,用于对当前和未来的方法进行全面测试。此外,我们探索了此类异常值的来源,在某些情况下突出了实验中其他的生物学或技术因素。更多详细信息可从项目网站下载:http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/ 。