Jia Cheng, Hu Yu, Liu Yichuan, Li Mingyao
Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
Cancer Inform. 2015 Feb 16;14(Suppl 1):45-53. doi: 10.4137/CIN.S24832. eCollection 2015.
One of the major mechanisms of generating mRNA diversity is alternative splicing, a regulated process that allows for the flexibility of producing functionally different proteins from the same genomic sequences. This process is often altered in cancer cells to produce aberrant proteins that drive the progression of cancer. A better understanding of the misregulation of alternative splicing will shed light on the development of novel targets for pharmacological interventions of cancer.
In this study, we evaluated three statistical methods, random effects meta-regression, beta regression, and generalized linear mixed effects model, for the analysis of splicing quantitative trait loci (sQTL) using RNA-Seq data. All the three methods use exon-inclusion levels estimated by the PennSeq algorithm, a statistical method that utilizes paired-end reads and accounts for non-uniform sequencing coverage.
Using both simulated and real RNA-Seq datasets, we compared these three methods with GLiMMPS, a recently developed method for sQTL analysis. Our results indicate that the most reliable and powerful method was the random effects meta-regression approach, which identified sQTLs at low false discovery rates but higher power when compared to GLiMMPS.
We have evaluated three statistical methods for the analysis of sQTLs in RNA-Seq. Results from our study will be instructive for researchers in selecting the appropriate statistical methods for sQTL analysis.
产生mRNA多样性的主要机制之一是可变剪接,这是一个受调控的过程,它使得从相同的基因组序列中产生功能不同的蛋白质具有灵活性。这个过程在癌细胞中常常发生改变,从而产生驱动癌症进展的异常蛋白质。更好地理解可变剪接的失调将有助于揭示癌症药物干预新靶点的开发。
在本研究中,我们评估了三种统计方法,即随机效应元回归、β回归和广义线性混合效应模型,用于使用RNA测序数据分析剪接定量性状位点(sQTL)。这三种方法均使用由PennSeq算法估计的外显子包含水平,PennSeq算法是一种利用双末端读数并考虑测序覆盖不均匀性的统计方法。
使用模拟和真实的RNA测序数据集,我们将这三种方法与GLiMMPS(一种最近开发的用于sQTL分析的方法)进行了比较。我们的结果表明,最可靠且最强大的方法是随机效应元回归方法,与GLiMMPS相比,该方法能以较低的错误发现率识别sQTL,但具有更高的效能。
我们评估了三种用于RNA测序中sQTL分析的统计方法。我们的研究结果将对研究人员选择合适的sQTL分析统计方法具有指导意义。