检测和校正大规模RNA测序数据中的系统变异。

Detecting and correcting systematic variation in large-scale RNA sequencing data.

作者信息

Li Sheng, Łabaj Paweł P, Zumbo Paul, Sykacek Peter, Shi Wei, Shi Leming, Phan John, Wu Po-Yen, Wang May, Wang Charles, Thierry-Mieg Danielle, Thierry-Mieg Jean, Kreil David P, Mason Christopher E

机构信息

1] Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA. [2] The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, USA. [3].

1] Chair of Bioinformatics Research Group, Boku University Vienna, Vienna, Austria. [2].

出版信息

Nat Biotechnol. 2014 Sep;32(9):888-95. doi: 10.1038/nbt.3000. Epub 2014 Aug 24.

DOI:10.1038/nbt.3000

PMID:25150837

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4160374/

Abstract

High-throughput RNA sequencing (RNA-seq) enables comprehensive scans of entire transcriptomes, but best practices for analyzing RNA-seq data have not been fully defined, particularly for data collected with multiple sequencing platforms or at multiple sites. Here we used standardized RNA samples with built-in controls to examine sources of error in large-scale RNA-seq studies and their impact on the detection of differentially expressed genes (DEGs). Analysis of variations in guanine-cytosine content, gene coverage, sequencing error rate and insert size allowed identification of decreased reproducibility across sites. Moreover, commonly used methods for normalization (cqn, EDASeq, RUV2, sva, PEER) varied in their ability to remove these systematic biases, depending on sample complexity and initial data quality. Normalization methods that combine data from genes across sites are strongly recommended to identify and remove site-specific effects and can substantially improve RNA-seq studies.

摘要

高通量RNA测序（RNA-seq）能够对整个转录组进行全面扫描，但分析RNA-seq数据的最佳方法尚未完全确定，特别是对于使用多个测序平台或在多个位点收集的数据。在这里，我们使用了带有内置对照的标准化RNA样本，以检查大规模RNA-seq研究中的误差来源及其对差异表达基因（DEG）检测的影响。对鸟嘌呤-胞嘧啶含量、基因覆盖率、测序错误率和插入片段大小的变异分析，有助于识别不同位点间再现性的降低。此外，常用的标准化方法（cqn、EDASeq、RUV2、sva、PEER）在消除这些系统偏差的能力上存在差异，这取决于样本复杂性和初始数据质量。强烈建议采用结合不同位点基因数据的标准化方法，以识别和消除位点特异性效应，并可显著改善RNA-seq研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/909c/4160374/772bc17c916f/nihms617193f1.jpg

相似文献

Detecting and correcting systematic variation in large-scale RNA sequencing data.

Nat Biotechnol. 2014 Sep;32(9):888-95. doi: 10.1038/nbt.3000. Epub 2014 Aug 24.

Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias.

PLoS Biol. 2019 Nov 12;17(11):e3000481. doi: 10.1371/journal.pbio.3000481. eCollection 2019 Nov.

Identification and correction of systematic error in high-throughput sequence data.

BMC Bioinformatics. 2011 Nov 21;12:451. doi: 10.1186/1471-2105-12-451.

GC-content normalization for RNA-Seq data.

BMC Bioinformatics. 2011 Dec 17;12:480. doi: 10.1186/1471-2105-12-480.

deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies.

BMC Genomics. 2015 Jun 13;16(1):455. doi: 10.1186/s12864-015-1676-0.

Systematic evaluation of RNA-Seq preparation protocol performance.

BMC Genomics. 2019 Jul 11;20(1):571. doi: 10.1186/s12864-019-5953-1.

Using normalization to resolve RNA-Seq biases caused by amplification from minimal input.

Physiol Genomics. 2014 Nov 1;46(21):808-20. doi: 10.1152/physiolgenomics.00196.2013. Epub 2014 Sep 16.

BADGE: a novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data.

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S6. doi: 10.1186/1471-2105-15-S9-S6. Epub 2014 Sep 10.

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.

Nat Biotechnol. 2014 Sep;32(9):903-14. doi: 10.1038/nbt.2957. Epub 2014 Aug 24.

The impact of quality filter for RNA-Seq.

Gene. 2015 Jun 1;563(2):165-71. doi: 10.1016/j.gene.2015.03.033. Epub 2015 Mar 18.

引用本文的文献

Integration of Bulk RNA-seq Pipeline Metrics for Assessing Low-Quality Samples.

Res Sq. 2025 Jul 3:rs.3.rs-6976695. doi: 10.21203/rs.3.rs-6976695/v1.

CorrAdjust unveils biologically relevant transcriptomic correlations by efficiently eliminating hidden confounders.

Nucleic Acids Res. 2025 May 22;53(10). doi: 10.1093/nar/gkaf444.

Worm Perturb-Seq: massively parallel whole-animal RNAi and RNA-seq.

Nat Commun. 2025 May 23;16(1):4785. doi: 10.1038/s41467-025-60154-0.

Worm Perturb-Seq: massively parallel whole-animal RNAi and RNA-seq.

bioRxiv. 2025 Feb 3:2025.02.02.636107. doi: 10.1101/2025.02.02.636107.

RNA-seq reproducibility of in laboratory models of cystic fibrosis.

Microbiol Spectr. 2025 Jan 7;13(1):e0151324. doi: 10.1128/spectrum.01513-24. Epub 2024 Dec 3.

Assessing and mitigating batch effects in large-scale omics studies.

Genome Biol. 2024 Oct 3;25(1):254. doi: 10.1186/s13059-024-03401-9.

Targeted DNA-seq and RNA-seq of Reference Samples with Short-read and Long-read Sequencing.

Sci Data. 2024 Aug 16;11(1):892. doi: 10.1038/s41597-024-03741-y.

Genomic reproducibility in the bioinformatics era.

Genome Biol. 2024 Aug 9;25(1):213. doi: 10.1186/s13059-024-03343-2.

A real-world multi-center RNA-seq benchmarking study using the Quartet and MAQC reference materials.

Nat Commun. 2024 Jul 22;15(1):6167. doi: 10.1038/s41467-024-50420-y.

Transcriptomics : Approaches to Quantifying Gene Expression and Their Application to Studying the Human Brain.

Curr Top Behav Neurosci. 2024;68:129-176. doi: 10.1007/7854_2024_466.

本文引用的文献

HTSeq--a Python framework to work with high-throughput sequencing data.

Bioinformatics. 2015 Jan 15;31(2):166-9. doi: 10.1093/bioinformatics/btu638. Epub 2014 Sep 25.

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.

Nat Biotechnol. 2014 Sep;32(9):903-14. doi: 10.1038/nbt.2957. Epub 2014 Aug 24.

Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study.

Nat Biotechnol. 2014 Sep;32(9):915-925. doi: 10.1038/nbt.2972. Epub 2014 Aug 24.

voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.

Genome Biol. 2014 Feb 3;15(2):R29. doi: 10.1186/gb-2014-15-2-r29.

Transcriptomic dissection of myogenic differentiation signature in caprine by RNA-Seq.

Mech Dev. 2014 May;132:79-92. doi: 10.1016/j.mod.2014.01.001. Epub 2014 Jan 11.

Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

Nat Biotechnol. 2013 Nov;31(11):1015-22. doi: 10.1038/nbt.2702. Epub 2013 Sep 15.

Software for computing and annotating genomic ranges.

PLoS Comput Biol. 2013;9(8):e1003118. doi: 10.1371/journal.pcbi.1003118. Epub 2013 Aug 8.

Systematic biases in DNA copy number originate from isolation procedures.

Genome Biol. 2013 Apr 24;14(4):R33. doi: 10.1186/gb-2013-14-4-r33.

Comparative RNA-Seq and microarray analysis of gene expression changes in B-cell lymphomas of Canis familiaris.

PLoS One. 2013 Apr 4;8(4):e61088. doi: 10.1371/journal.pone.0061088. Print 2013.

Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data.

PLoS Comput Biol. 2013 Apr;9(4):e1003031. doi: 10.1371/journal.pcbi.1003031. Epub 2013 Apr 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

检测和校正大规模RNA测序数据中的系统变异。

Detecting and correcting systematic variation in large-scale RNA sequencing data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献