发现一致模式:一种用于鉴定 RNA-Seq 数据中差异表达的非参数方法。
Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data.
机构信息
1Department of Statistics, Stanford University, Stanford, CA 94305, USA.
出版信息
Stat Methods Med Res. 2013 Oct;22(5):519-36. doi: 10.1177/0962280211428386. Epub 2011 Nov 28.
We discuss the identification of features that are associated with an outcome in RNA-Sequencing (RNA-Seq) and other sequencing-based comparative genomic experiments. RNA-Seq data takes the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especially challenging because different sequencing experiments may generate quite different total numbers of reads, or 'sequencing depths'. Existing methods for this problem are based on Poisson or negative binomial models: they are useful but can be heavily influenced by 'outliers' in the data. We introduce a simple, non-parametric method with resampling to account for the different sequencing depths. The new method is more robust than parametric methods. It can be applied to data with quantitative, survival, two-class or multiple-class outcomes. We compare our proposed method to Poisson and negative binomial-based methods in simulated and real data sets, and find that our method discovers more consistent patterns than competing methods.
我们讨论了在 RNA 测序(RNA-Seq)和其他基于测序的比较基因组实验中识别与结果相关的特征。RNA-Seq 数据采用计数的形式,因此基于正态分布的模型通常不适用。这个问题特别具有挑战性,因为不同的测序实验可能产生非常不同的总读取数,或“测序深度”。现有的此类问题的方法基于泊松或负二项式模型:它们很有用,但可能会受到数据中的“异常值”的严重影响。我们引入了一种简单的、基于重采样的非参数方法来考虑不同的测序深度。新方法比参数方法更稳健。它可以应用于具有定量、生存、两分类或多分类结果的数据。我们在模拟数据集和真实数据集上比较了我们提出的方法和泊松和负二项式方法,发现我们的方法比竞争方法发现了更一致的模式。