Lee Juhee, Ji Yuan, Liang Shoudan, Cai Guoshuai, Müller Peter
Department of Biostatistics, UT M.D. Anderson Cancer Center Houston, Texas, USA.
Cancer Inform. 2011;10:205-15. doi: 10.4137/CIN.S7473. Epub 2011 Aug 1.
RNA-Seq is a novel technology that provides read counts of RNA fragments in each gene, including the mapped positions of each read within each gene. Besides many other applications it can be used to detect differentially expressed genes. Most published methods collapse the position-level read data into a single gene-specific expression measurement. Statistical inference proceeds by modeling these gene-level expression measurements.
We present a Bayesian method of calling differential expression (BM-DE) that directly models the position-level read counts. We demonstrate the potential advantage of the BM-DE method compared to existing approaches that rely on gene-level aggregate data. An important additional feature of the proposed approach is that BM-DE can be used to analyze RNA-Seq data from experiments without biological replicates. This becomes possible since the approach works with multiple position-level read counts for each gene. We demonstrate the importance of modeling for position-level read counts with a yeast data set and a simulation study.
A public domain R package is available from http://odin.mdacc.tmc.edu/~ylji/BMDE/.
RNA测序是一项新技术,它能提供每个基因中RNA片段的读数计数,包括每个读数在每个基因内的映射位置。除了许多其他应用外,它还可用于检测差异表达基因。大多数已发表的方法将位置级读数数据汇总为单个基因特异性表达测量值。统计推断通过对这些基因级表达测量值进行建模来进行。
我们提出了一种贝叶斯差异表达调用方法(BM-DE),该方法直接对位置级读数计数进行建模。我们展示了BM-DE方法与依赖基因级汇总数据的现有方法相比的潜在优势。所提出方法的一个重要附加特征是,BM-DE可用于分析来自无生物学重复实验的RNA测序数据。这之所以成为可能,是因为该方法处理每个基因的多个位置级读数计数。我们通过一个酵母数据集和一项模拟研究证明了对位置级读数计数进行建模的重要性。