Department of Otolaryngology - Head and Neck Surgery, Amsterdam UMC, Vrije Universiteit Amsterdam, De Boelelaan 1117, Amsterdam, 1081, HV, The Netherlands.
Department of Epidemiology and Biostatistics, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, 1007, MB, The Netherlands.
BMC Bioinformatics. 2018 Aug 20;19(1):301. doi: 10.1186/s12859-018-2306-z.
Reproducibility of hits from independent CRISPR or siRNA screens is poor. This is partly due to data normalization primarily addressing technical variability within independent screens, and not the technical differences between them.
We present "rscreenorm", a method that standardizes the functional data ranges between screens using assay controls, and subsequently performs a piecewise-linear normalization to make data distributions across all screens comparable. In simulation studies, rscreenorm reduces false positives. Using two multiple-cell lines siRNA screens, rscreenorm increased reproducibility between 27 and 62% for hits, and up to 5-fold for non-hits. Using publicly available CRISPR-Cas screen data, application of commonly used median centering yields merely 34% of overlapping hits, in contrast with rscreenorm yielding 84% of overlapping hits. Furthermore, rscreenorm yielded at most 8% discordant results, whilst median-centering yielded as much as 55%.
Rscreenorm yields more consistent results and keeps false positive rates under control, improving reproducibility of genetic screens data analysis from multiple cell lines.
独立的 CRISPR 或 siRNA 筛选的命中率重现性较差。这在一定程度上是由于数据归一化主要针对独立筛选中的技术变异性,而不是它们之间的技术差异。
我们提出了“rscreenorm”,这是一种使用测定对照物在屏幕之间标准化功能数据范围的方法,然后进行分段线性归一化以使所有屏幕上的数据分布具有可比性。在模拟研究中,rscreenorm 减少了假阳性。使用两个多细胞系 siRNA 筛选,rscreenorm 增加了命中的重现性,从 27%到 62%不等,而非命中的重现性高达 5 倍。使用公开的 CRISPR-Cas 筛选数据,常用的中位数中心化仅产生 34%的重叠命中,而 rscreenorm 则产生 84%的重叠命中。此外,rscreenorm 最多产生 8%的不一致结果,而中位数中心化则产生高达 55%的不一致结果。
rscreenorm 产生更一致的结果,并控制假阳性率,从而提高了来自多个细胞系的遗传筛选数据分析的重现性。