RNA测序实验需要多少生物学重复，以及应该使用哪种差异表达工具？

How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?

作者信息

Schurch Nicholas J, Schofield Pietá, Gierliński Marek, Cole Christian, Sherstnev Alexander, Singh Vijender, Wrobel Nicola, Gharbi Karim, Simpson Gordon G, Owen-Hughes Tom, Blaxter Mark, Barton Geoffrey J

机构信息

Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom.

Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom Division of Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom.

出版信息

RNA. 2016 Jun;22(6):839-51. doi: 10.1261/rna.053959.115. Epub 2016 Mar 28.

DOI:10.1261/rna.053959.115

PMID:27022035

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4878611/

Abstract

RNA-seq is now the technology of choice for genome-wide differential gene expression experiments, but it is not clear how many biological replicates are needed to ensure valid biological interpretation of the results or which statistical tools are best for analyzing the data. An RNA-seq experiment with 48 biological replicates in each of two conditions was performed to answer these questions and provide guidelines for experimental design. With three biological replicates, nine of the 11 tools evaluated found only 20%-40% of the significantly differentially expressed (SDE) genes identified with the full set of 42 clean replicates. This rises to >85% for the subset of SDE genes changing in expression by more than fourfold. To achieve >85% for all SDE genes regardless of fold change requires more than 20 biological replicates. The same nine tools successfully control their false discovery rate at ≲5% for all numbers of replicates, while the remaining two tools fail to control their FDR adequately, particularly for low numbers of replicates. For future RNA-seq experiments, these results suggest that at least six biological replicates should be used, rising to at least 12 when it is important to identify SDE genes for all fold changes. If fewer than 12 replicates are used, a superior combination of true positive and false positive performances makes edgeR and DESeq2 the leading tools. For higher replicate numbers, minimizing false positives is more important and DESeq marginally outperforms the other tools.

摘要

RNA测序如今是全基因组差异基因表达实验的首选技术，但尚不清楚需要多少生物学重复才能确保对结果进行有效的生物学解读，也不清楚哪种统计工具最适合分析数据。为回答这些问题并为实验设计提供指导方针，进行了一项RNA测序实验，在两种条件下每种条件均设置48个生物学重复。对于11种评估工具中的9种而言，若只有3个生物学重复，那么它们所发现的显著差异表达（SDE）基因仅占利用全部42个有效重复鉴定出的SDE基因的20%-40%。对于表达变化超过四倍的SDE基因子集，这一比例升至>85%。要使所有SDE基因无论其倍数变化如何都能达到>85%的比例，则需要超过20个生物学重复。对于所有重复数量，同样的9种工具都能成功将其错误发现率控制在≲5%，而其余两种工具无法充分控制其错误发现率，尤其是在重复数量较少时。对于未来的RNA测序实验，这些结果表明应至少使用6个生物学重复，若要识别所有倍数变化的SDE基因，这一数量应至少增至12个。如果使用的重复数量少于12个，由于真阳性和假阳性表现的出色组合，edgeR和DESeq2成为领先工具。对于更多的重复数量，将假阳性降至最低更为重要，DESeq略优于其他工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f9c/4878611/0e75a13e69d6/839F1.jpg

相似文献

How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?RNA测序实验需要多少生物学重复，以及应该使用哪种差异表达工具？

RNA. 2016 Jun;22(6):839-51. doi: 10.1261/rna.053959.115. Epub 2016 Mar 28.

Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size.基于生物学重复次数和文库大小的RNA测序差异基因表达分析的优化

Front Plant Sci. 2018 Feb 14;9:108. doi: 10.3389/fpls.2018.00108. eCollection 2018.

Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.源自双条件48次重复实验的RNA测序数据的统计模型。

Bioinformatics. 2015 Nov 15;31(22):3625-30. doi: 10.1093/bioinformatics/btv425. Epub 2015 Jul 23.

Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing.使用 RNA 测序检测差异表达的高效实验设计和分析策略。

BMC Genomics. 2012 Sep 17;13:484. doi: 10.1186/1471-2164-13-484.

A comparison of strategies for generating artificial replicates in RNA-seq experiments.RNA-seq 实验中人工重复生成策略的比较。

Sci Rep. 2022 May 3;12(1):7170. doi: 10.1038/s41598-022-11302-9.

Bootstrap-based differential gene expression analysis for RNA-Seq data with and without replicates.基于Bootstrap的RNA-Seq数据有无生物学重复情况下的差异基因表达分析

BMC Genomics. 2014;15 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2164-15-S8-S2. Epub 2014 Nov 13.

A comparative study of techniques for differential expression analysis on RNA-Seq data.RNA测序数据差异表达分析技术的比较研究

PLoS One. 2014 Aug 13;9(8):e103207. doi: 10.1371/journal.pone.0103207. eCollection 2014.

An evaluation of RNA-seq differential analysis methods.RNA-seq 差异分析方法评估。

PLoS One. 2022 Sep 16;17(9):e0264246. doi: 10.1371/journal.pone.0264246. eCollection 2022.

Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods.RNA-Seq 差异表达分析工具的基准测试：基于标准化与基于对数比变换的方法。

BMC Bioinformatics. 2018 Jul 18;19(1):274. doi: 10.1186/s12859-018-2261-8.

Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.通过纳入非外显子映射读数对RNA测序数据进行差异表达分析。

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11.

引用本文的文献

Gene expression profiling and pathway analysis in acute myeloid leukaemia-normal karyotype patients.急性髓系白血病正常核型患者的基因表达谱分析及通路分析

PLoS One. 2025 Sep 5;20(9):e0328911. doi: 10.1371/journal.pone.0328911. eCollection 2025.

Genetic Signatures of Competitive Performance in Burmese Gamecocks: A Transcriptomic Analysis.缅甸斗鸡竞技性能的遗传特征：一项转录组分析

Biology (Basel). 2025 Aug 16;14(8):1066. doi: 10.3390/biology14081066.

Influence of the Origin, Feeding Status, and Infection in the Microbial Composition of the Digestive Tract of .来源、喂养状态及感染对……消化道微生物组成的影响

Biology (Basel). 2025 Aug 2;14(8):984. doi: 10.3390/biology14080984.

Decoding the amniotic membrane transcriptome during equine ascending placentitis.解析马属动物上行性胎盘炎期间羊膜转录组

Sci Rep. 2025 Aug 21;15(1):30714. doi: 10.1038/s41598-025-16671-5.

Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2.使用ALDEx2对RNA测序计数数据进行分析的显式尺度模拟。

NAR Genom Bioinform. 2025 Aug 19;7(3):lqaf108. doi: 10.1093/nargab/lqaf108. eCollection 2025 Sep.

Metatranscriptomics Uncover Diurnal Functional Shifts in Bacterial Transgenes with Profound Metabolic Effects.宏转录组学揭示细菌转基因中具有深远代谢影响的昼夜功能变化。

Cell Host Microbe. 2025 Jul 9;33(7):1057-1072. doi: 10.1016/j.chom.2025.05.024. Epub 2025 Jun 18.

Information-Content-Informed Kendall-tau Correlation Methodology: Interpreting Missing Values as Useful Information.信息内容告知的肯德尔tau相关性方法：将缺失值解释为有用信息。

bioRxiv. 2025 Jul 21:2022.02.24.481854. doi: 10.1101/2022.02.24.481854.

How thoughtful experimental design can empower biologists in the omics era.深思熟虑的实验设计如何在组学时代助力生物学家。

Nat Commun. 2025 Aug 6;16(1):7263. doi: 10.1038/s41467-025-62616-x.

The transcriptome of the olm provides insights into its evolution and gene expression.洞螈的转录组为其进化和基因表达提供了见解。

Sci Rep. 2025 Aug 3;15(1):28324. doi: 10.1038/s41598-025-10073-3.

Short-Term Probiotic Colonization Alters Molecular Dynamics of 3D Oral Biofilms.短期益生菌定植改变三维口腔生物膜的分子动力学。

Int J Mol Sci. 2025 Jul 3;26(13):6403. doi: 10.3390/ijms26136403.

本文引用的文献

Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities.RNAseq Fastq 文件中 DNA k-mer 计数的层次聚类可识别样本异质性。

Int J Mol Sci. 2018 Nov 21;19(11):3687. doi: 10.3390/ijms19113687.

Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.源自双条件48次重复实验的RNA测序数据的统计模型。

Bioinformatics. 2015 Nov 15;31(22):3625-30. doi: 10.1093/bioinformatics/btv425. Epub 2015 Jul 23.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。

Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.

Systematic integration of RNA-Seq statistical algorithms for accurate detection of differential gene expression patterns.用于准确检测差异基因表达模式的RNA测序统计算法的系统整合。

Nucleic Acids Res. 2015 Feb 27;43(4):e25. doi: 10.1093/nar/gku1273. Epub 2014 Dec 1.

Error estimates for the analysis of differential expression from RNA-seq count data.RNA-seq 计数数据差异表达分析的误差估计。

PeerJ. 2014 Sep 23;2:e576. doi: 10.7717/peerj.576. eCollection 2014.

HTSeq--a Python framework to work with high-throughput sequencing data.HTSeq——一个用于处理高通量测序数据的Python框架。

Bioinformatics. 2015 Jan 15;31(2):166-9. doi: 10.1093/bioinformatics/btu638. Epub 2014 Sep 25.

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.测序质量控制联盟对RNA测序准确性、可重复性和信息含量的全面评估。

Nat Biotechnol. 2014 Sep;32(9):903-14. doi: 10.1038/nbt.2957. Epub 2014 Aug 24.

compcodeR--an R package for benchmarking differential expression methods for RNA-seq data.compcodeR——一个用于对RNA测序数据差异表达方法进行基准测试的R软件包。

Bioinformatics. 2014 Sep 1;30(17):2517-8. doi: 10.1093/bioinformatics/btu324. Epub 2014 May 9.

Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-Seq and ESTs.通过链特异性直接RNA测序、RNA测序和ESTs相结合，改进3'非翻译区和复杂基因座的注释。

PLoS One. 2014 Apr 10;9(4):e94270. doi: 10.1371/journal.pone.0094270. eCollection 2014.

Evaluation of read count based RNAseq analysis methods.基于读段计数的 RNAseq 分析方法评估。

BMC Genomics. 2013;14 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2164-14-S8-S2. Epub 2013 Dec 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

RNA测序实验需要多少生物学重复，以及应该使用哪种差异表达工具？

How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献