Zhang Xiaohua Douglas, Yang Xiting Cindy, Chung Namjin, Gates Adam, Stec Erica, Kunapuli Priya, Holder Dan J, Ferrer Marc, Espeseth Amy S
Merck Research Laboratories, Biometrics Research, West Point, PA 19486, USA.
Pharmacogenomics. 2006 Apr;7(3):299-309. doi: 10.2217/14622416.7.3.299.
RNA interference (RNAi) high-throughput screening (HTS) experiments carried out using large (>5000 short interfering [si]RNA) libraries generate a huge amount of data. In order to use these data to identify the most effective siRNAs tested, it is critical to adopt and develop appropriate statistical methods. To address the questions in hit selection of RNAi HTS, we proposed a quartile-based method which is robust to outliers, true hits and nonsymmetrical data. We compared it with the more traditional tests, mean +/- k standard deviation (SD) and median +/- 3 median of absolute deviation (MAD). The results suggested that the quartile-based method selected more hits than mean +/- k SD under the same preset error rate. The number of hits selected by median +/- k MAD was close to that by the quartile-based method. Further analysis suggested that the quartile-based method had the greatest power in detecting true hits, especially weak or moderate true hits. Our investigation also suggested that platewise analysis (determining effective siRNAs on a plate-by-plate basis) can adjust for systematic errors in different plates, while an experimentwise analysis, in which effective siRNAs are identified in an analysis of the entire experiment, cannot. However, experimentwise analysis may detect a cluster of true positive hits placed together in one or several plates, while platewise analysis may not. To display hit selection results, we designed a specific figure called a plate-well series plot. We thus suggest the following strategy for hit selection in RNAi HTS experiments. First, choose the quartile-based method, or median +/- k MAD, for identifying effective siRNAs. Second, perform the chosen method experimentwise on transformed/normalized data, such as percentage inhibition, to check the possibility of hit clusters. If a cluster of selected hits are observed, repeat the analysis based on untransformed data to determine whether the cluster is due to an artifact in the data. If no clusters of hits are observed, select hits by performing platewise analysis on transformed data. Third, adopt the plate-well series plot to visualize both the data and the hit selection results, as well as to check for artifacts.
使用大型(>5000个小干扰RNA [siRNA])文库进行的RNA干扰(RNAi)高通量筛选(HTS)实验会产生大量数据。为了利用这些数据来识别测试的最有效的siRNA,采用并开发合适的统计方法至关重要。为了解决RNAi HTS命中选择中的问题,我们提出了一种基于四分位数的方法,该方法对异常值、真正的命中结果和非对称数据具有鲁棒性。我们将其与更传统的检验方法(均值±k标准差[SD]和中位数±3绝对偏差中位数[MAD])进行了比较。结果表明,在相同的预设错误率下,基于四分位数的方法比均值±k SD选择的命中结果更多。中位数±k MAD选择的命中结果数量与基于四分位数的方法相近。进一步分析表明,基于四分位数的方法在检测真正的命中结果方面具有最大的功效,尤其是弱或中等强度的真正命中结果。我们的研究还表明,逐板分析(逐板确定有效的siRNA)可以调整不同板中的系统误差,而在整个实验分析中识别有效siRNA的全实验分析则不能。然而,全实验分析可能会检测到在一个或几个板中聚集在一起的一组真正阳性命中结果,而逐板分析可能检测不到。为了展示命中选择结果,我们设计了一种名为板孔系列图的特定图形。因此,我们建议在RNAi HTS实验中采用以下命中选择策略。首先,选择基于四分位数的方法或中位数±k MAD来识别有效的siRNA。其次,对转化/归一化数据(如抑制百分比)进行全实验分析,以检查命中结果聚集的可能性。如果观察到一组选定的命中结果,基于未转化数据重复分析,以确定该聚集是否是由于数据中的假象。如果未观察到命中结果的聚集,则通过对转化数据进行逐板分析来选择命中结果。第三,采用板孔系列图来可视化数据和命中选择结果,以及检查假象。