Jia Gaoxiang, Wang Xinlei, Xiao Guanghua
Department of Statistical Science, Southern Methodist University, Dallas, TX, 75205, USA.
Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
BMC Genomics. 2017 Jul 19;18(1):545. doi: 10.1186/s12864-017-3938-5.
Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single specific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms.
We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level by permuting sgRNA labels, and thus it avoids restrictive distributional assumptions. Although PBNPA is designed to analyze CRISPR data, it can also be applied to analyze genetic screens implemented with siRNAs or shRNAs and drug screens.
We compared the performance of PBNPA with competing methods on simulated data as well as on real data. PBNPA outperformed recent methods designed for CRISPR screen analysis, as well as methods used for analyzing other functional genomics screens, in terms of Receiver Operating Characteristics (ROC) curves and False Discovery Rate (FDR) control for simulated data under various settings. Remarkably, the PBNPA algorithm showed better consistency and FDR control on published real data as well.
PBNPA yields more consistent and reliable results than its competitors, especially when the data quality is low. R package of PBNPA is available at: https://cran.r-project.org/web/packages/PBNPA/ .
成簇规律间隔短回文重复序列(CRISPR)筛选通常在培养细胞中进行,以鉴定具有关键功能的基因。尽管已经开发或采用了几种方法来分析CRISPR筛选数据,但没有一种特定算法广受欢迎。因此,需要严格的程序来克服现有算法的缺点。
我们开发了一种基于排列的非参数分析(PBNPA)算法,该算法通过对sgRNA标签进行排列来计算基因水平的p值,从而避免了严格的分布假设。尽管PBNPA旨在分析CRISPR数据,但它也可用于分析使用siRNA或shRNA进行的基因筛选以及药物筛选。
我们在模拟数据和真实数据上比较了PBNPA与其他竞争方法的性能。在各种设置下的模拟数据的受试者工作特征(ROC)曲线和错误发现率(FDR)控制方面,PBNPA优于最近设计用于CRISPR筛选分析的方法以及用于分析其他功能基因组筛选的方法。值得注意的是,PBNPA算法在已发表的真实数据上也表现出更好的一致性和FDR控制。
PBNPA比其竞争对手产生更一致、更可靠的结果,尤其是在数据质量较低时。PBNPA的R包可在以下网址获得:https://cran.r-project.org/web/packages/PBNPA/ 。