Institute of Genomics and Systems Biology, Committee on Development, Regeneration, and Stem Cell Biology and Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
Bioinformatics. 2014 Feb 1;30(3):301-4. doi: 10.1093/bioinformatics/btt688. Epub 2013 Dec 6.
RNA-seq is replacing microarrays as the primary tool for gene expression studies. Many RNA-seq studies have used insufficient biological replicates, resulting in low statistical power and inefficient use of sequencing resources.
We show the explicit trade-off between more biological replicates and deeper sequencing in increasing power to detect differentially expressed (DE) genes. In the human cell line MCF7, adding more sequencing depth after 10 M reads gives diminishing returns on power to detect DE genes, whereas adding biological replicates improves power significantly regardless of sequencing depth. We also propose a cost-effectiveness metric for guiding the design of large-scale RNA-seq DE studies. Our analysis showed that sequencing less reads and performing more biological replication is an effective strategy to increase power and accuracy in large-scale differential expression RNA-seq studies, and provided new insights into efficient experiment design of RNA-seq studies.
The code used in this paper is provided on: http://home.uchicago.edu/∼jiezhou/replication/. The expression data is deposited in the Gene Expression Omnibus under the accession ID GSE51403.
RNA-seq 正在取代微阵列成为基因表达研究的主要工具。许多 RNA-seq 研究使用的生物学重复样本不足,导致统计功效低,测序资源利用效率低下。
我们展示了在增加检测差异表达(DE)基因的功效方面,更多生物学重复和更深测序之间的明确权衡。在人类细胞系 MCF7 中,在达到 1000 万读长后增加测序深度对检测 DE 基因的功效回报递减,而增加生物学重复无论测序深度如何都能显著提高功效。我们还提出了一种成本效益指标,用于指导大规模 RNA-seq DE 研究的设计。我们的分析表明,在大规模差异表达 RNA-seq 研究中,减少测序读长并进行更多生物学重复是一种增加功效和准确性的有效策略,并为 RNA-seq 研究的实验设计提供了新的见解。
本文中使用的代码可在:http://home.uchicago.edu/∼jiezhou/replication/ 获得。表达数据已在基因表达综合数据库中以 accession ID GSE51403 进行了存储。