Gim Jungsoo, Won Sungho, Park Taesung
Institute of Health and Environment, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, Korea.
Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, Korea.
PLoS One. 2016 Aug 17;11(8):e0159182. doi: 10.1371/journal.pone.0159182. eCollection 2016.
RNA-Sequencing (RNA-Seq) provides valuable information for characterizing the molecular nature of the cells, in particular, identification of differentially expressed transcripts on a genome-wide scale. Unfortunately, cost and limited specimen availability often lead to studies with small sample sizes, and hypothesis testing on differential expression between classes with a small number of samples is generally limited. The problem is especially challenging when only one sample per each class exists. In this case, only a few methods among many that have been developed are applicable for identifying differentially expressed transcripts. Thus, the aim of this study was to develop a method able to accurately test differential expression with a limited number of samples, in particular non-replicated samples. We propose a local-pooled-error method for RNA-Seq data (LPEseq) to account for non-replicated samples in the analysis of differential expression. Our LPEseq method extends the existing LPE method, which was proposed for microarray data, to allow examination of non-replicated RNA-Seq experiments. We demonstrated the validity of the LPEseq method using both real and simulated datasets. By comparing the results obtained using the LPEseq method with those obtained from other methods, we found that the LPEseq method outperformed the others for non-replicated datasets, and showed a similar performance with replicated samples; LPEseq consistently showed high true discovery rate while not increasing the rate of false positives regardless of the number of samples. Our proposed LPEseq method can be effectively used to conduct differential expression analysis as a preliminary design step or for investigation of a rare specimen, for which a limited number of samples is available.
RNA测序(RNA-Seq)为表征细胞的分子性质提供了有价值的信息,特别是在全基因组范围内鉴定差异表达的转录本。不幸的是,成本和样本可用性有限常常导致样本量较小的研究,并且对少量样本的类别之间的差异表达进行假设检验通常受到限制。当每个类别仅存在一个样本时,这个问题尤其具有挑战性。在这种情况下,在众多已开发的方法中只有少数适用于鉴定差异表达的转录本。因此,本研究的目的是开发一种能够在样本数量有限(特别是非重复样本)的情况下准确检验差异表达的方法。我们提出了一种用于RNA-Seq数据的局部合并误差方法(LPEseq),以在差异表达分析中考虑非重复样本。我们的LPEseq方法扩展了现有的针对微阵列数据提出的LPE方法,以允许对非重复的RNA-Seq实验进行检验。我们使用真实和模拟数据集证明了LPEseq方法的有效性。通过将使用LPEseq方法获得的结果与从其他方法获得的结果进行比较,我们发现LPEseq方法在非重复数据集上优于其他方法,并且在重复样本上表现出类似的性能;无论样本数量多少,LPEseq始终显示出高真发现率,同时不会增加假阳性率。我们提出的LPEseq方法可以有效地用于作为初步设计步骤进行差异表达分析,或用于对样本数量有限的罕见标本进行研究。