Kotoka Ekua, Orr Megan
.
Stat Appl Genet Mol Biol. 2017 Nov 27;16(5-6):291-312. doi: 10.1515/sagmb-2016-0037.
RNA-Seq is a developing technology for generating gene expression data by directly sequencing mRNA molecules in a sample. RNA-Seq data consist of counts of reads recorded to a particular gene that are often used to identify differentially expressed (DE) genes. A common statistical method used to analyze RNA-Seq data is Significance Analysis of Microarray with emphasis on RNA-Seq data (SAMseq). SAMseq is a nonparametric method that uses a resampling technique to account for differences in sequencing depths when identifying DE genes. We propose a modification of this method that takes into account asymmetry in the distribution of the effect sizes by taking into account the sign of the test statistics. Through simulation studies, we showthat the proposed method, comparedwith the traditional SAMseqmethod and other existing methods provides better power for identifying truly DE genes or more sufficiently controls FDR in most settings where asymmetry is present. We illustrate the use of the proposed method by analyzing an RNA-Seq data set containing C57BL/6J (B6) and DBA/2J (D2) mouse strains samples.
RNA测序是一种通过直接对样本中的mRNA分子进行测序来生成基因表达数据的新兴技术。RNA测序数据由记录到特定基因的读数计数组成,这些计数通常用于识别差异表达(DE)基因。一种用于分析RNA测序数据的常见统计方法是侧重于RNA测序数据的微阵列显著性分析(SAMseq)。SAMseq是一种非参数方法,它使用重采样技术来考虑在识别DE基因时测序深度的差异。我们提出了对该方法的一种改进,通过考虑检验统计量的符号来考虑效应大小分布的不对称性。通过模拟研究,我们表明,与传统的SAMseq方法和其他现有方法相比,所提出的方法在大多数存在不对称性的情况下,为识别真正的DE基因提供了更好的功效,或者更充分地控制了错误发现率(FDR)。我们通过分析一个包含C57BL/6J(B6)和DBA/2J(D2)小鼠品系样本的RNA测序数据集来说明所提出方法的使用。