Bioinformatics and Genomics Department, Centro de Investigación Príncipe Felipe, 46012 Valencia, Spain.
Genome Res. 2011 Dec;21(12):2213-23. doi: 10.1101/gr.124321.111. Epub 2011 Sep 8.
Next-generation sequencing (NGS) technologies are revolutionizing genome research, and in particular, their application to transcriptomics (RNA-seq) is increasingly being used for gene expression profiling as a replacement for microarrays. However, the properties of RNA-seq data have not been yet fully established, and additional research is needed for understanding how these data respond to differential expression analysis. In this work, we set out to gain insights into the characteristics of RNA-seq data analysis by studying an important parameter of this technology: the sequencing depth. We have analyzed how sequencing depth affects the detection of transcripts and their identification as differentially expressed, looking at aspects such as transcript biotype, length, expression level, and fold-change. We have evaluated different algorithms available for the analysis of RNA-seq and proposed a novel approach--NOISeq--that differs from existing methods in that it is data-adaptive and nonparametric. Our results reveal that most existing methodologies suffer from a strong dependency on sequencing depth for their differential expression calls and that this results in a considerable number of false positives that increases as the number of reads grows. In contrast, our proposed method models the noise distribution from the actual data, can therefore better adapt to the size of the data set, and is more effective in controlling the rate of false discoveries. This work discusses the true potential of RNA-seq for studying regulation at low expression ranges, the noise within RNA-seq data, and the issue of replication.
下一代测序(NGS)技术正在彻底改变基因组学研究,特别是它们在转录组学(RNA-seq)中的应用,正越来越多地被用于基因表达谱分析,以替代微阵列。然而,RNA-seq 数据的特性尚未完全确定,需要进一步的研究来了解这些数据如何响应差异表达分析。在这项工作中,我们通过研究该技术的一个重要参数——测序深度,旨在深入了解 RNA-seq 数据分析的特点。我们分析了测序深度如何影响转录本的检测及其作为差异表达的识别,研究了转录本的生物类型、长度、表达水平和倍数变化等方面。我们评估了 RNA-seq 分析的不同算法,并提出了一种新的方法——NOISeq,与现有方法不同的是,它是数据自适应的和非参数的。我们的结果表明,大多数现有的方法在进行差异表达分析时,对测序深度有很强的依赖性,这导致了大量的假阳性,随着读取次数的增加而增加。相比之下,我们提出的方法从实际数据中建模噪声分布,因此可以更好地适应数据集的大小,并且在控制假发现率方面更有效。这项工作讨论了 RNA-seq 在低表达范围的调控研究中的真正潜力、RNA-seq 数据中的噪声以及复制问题。
Genome Res. 2011-9-8
BMC Bioinformatics. 2014-3-31
Integr Biol (Camb). 2011-2-4
J Bioinform Comput Biol. 2015-12
Nat Commun. 2025-8-6
Front Microbiol. 2025-3-11
Bioinformatics. 2022-5-13
Nature. 2011-1-27
Nat Genet. 2010-12-26
Nature. 2010-12-22
Genome Biol. 2010-12-22
Nucleic Acids Res. 2011-1
Nucleic Acids Res. 2011-1
Nature. 2010-10-28
Genome Biol. 2010-10-27
Nat Methods. 2010-9-12