一种标准化RNA测序数据的综合方法。

An integrative method to normalize RNA-Seq data.

作者信息

Filloux Cyril, Cédric Meersseman, Romain Philippe, Lionel Forestier, Christophe Klopp, Dominique Rocha, Abderrahman Maftah, Daniel Petit

机构信息

INRA, UMR1061, Unité Génétique Moléculaire Animale, 123 avenue Albert Thomas, F-87060 Limoges Cedex, France.

出版信息

BMC Bioinformatics. 2014 Jun 14;15:188. doi: 10.1186/1471-2105-15-188.

DOI:10.1186/1471-2105-15-188

PMID:24929920

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4067528/

Abstract

BACKGROUND

Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. In order to correct some of them, several normalization approaches have emerged, differing both in the statistical strategy employed and in the type of corrected biases. However, there is no clear standard normalization method.

RESULTS

We present a novel methodology to normalize RNA-Seq data, taking into account transcript size, GC content, and sequencing depth, which are the major quantification-related biases. In this study, we found that transcripts shorter than 600 bp have an underestimated expression level, while longer transcripts are even more overestimated that they are long. Second, it was well known that the higher the GC content (>50%), the more the transcripts are underestimated. Third, we demonstrated that the sequencing depth impacts the size bias and proposed a correction allowing the comparison of expression levels among many samples. The efficiency of our approach was then tested by comparing the correlation between normalized RNA-Seq data and qRT-PCR expression measurements. All the steps are automated in a program written in Perl and available on request.

CONCLUSIONS

The methodology presented in this article identifies and corrects different biases that influence RNA-Seq quantification, and provides more accurate estimations of gene expression levels. This method can be applied to compare expression quantifications from many samples, but preferentially from the same tissue. In order to compare samples from different tissue, a calibration using several reference genes will be required.

摘要

背景

转录组测序是测量基因表达的强大工具，但与其他一些技术一样，各种人为因素和偏差会影响定量分析。为了纠正其中一些问题，出现了几种标准化方法，它们在采用的统计策略和校正偏差的类型上都有所不同。然而，目前尚无明确的标准标准化方法。

结果

我们提出了一种新的方法来标准化RNA测序数据，该方法考虑了转录本大小、GC含量和测序深度，这些是与定量分析相关的主要偏差。在本研究中，我们发现长度小于600bp的转录本表达水平被低估，而较长的转录本被高估得更严重。其次，众所周知，GC含量越高（>50%），转录本被低估得越多。第三，我们证明了测序深度会影响大小偏差，并提出了一种校正方法，可用于比较多个样本之间的表达水平。然后，通过比较标准化RNA测序数据与qRT-PCR表达测量之间的相关性，测试了我们方法的效率。所有步骤都在一个用Perl编写的程序中自动执行，可根据要求提供。

结论

本文提出的方法识别并纠正了影响RNA测序定量分析的不同偏差，并提供了更准确的基因表达水平估计。该方法可用于比较多个样本的表达定量分析，但优先用于来自同一组织的样本。为了比较来自不同组织的样本，将需要使用几个参考基因进行校准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71d4/4067528/0a33496aa56c/1471-2105-15-188-1.jpg

相似文献

An integrative method to normalize RNA-Seq data.一种标准化RNA测序数据的综合方法。

BMC Bioinformatics. 2014 Jun 14;15:188. doi: 10.1186/1471-2105-15-188.

Bias and Correction in RNA-seq Data for Marine Species.海洋物种 RNA-seq 数据中的偏差与校正。

Mar Biotechnol (NY). 2017 Oct;19(5):541-550. doi: 10.1007/s10126-017-9773-5. Epub 2017 Sep 7.

Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols.在比较不同样本和测序方案时，滥用 RPKM 或 TPM 标准化。

RNA. 2020 Aug;26(8):903-909. doi: 10.1261/rna.074922.120. Epub 2020 Apr 13.

Bias detection and correction in RNA-Sequencing data.RNA 测序数据中的偏差检测和校正。

BMC Bioinformatics. 2011 Jul 19;12:290. doi: 10.1186/1471-2105-12-290.

Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl(-/-) retinal transcriptomes.新一代测序技术有助于对野生型和Nrl基因敲除小鼠视网膜转录组进行定量分析。

Mol Vis. 2011;17:3034-54. Epub 2011 Nov 23.

DAFS: a data-adaptive flag method for RNA-sequencing data to differentiate genes with low and high expression.DAFS：一种用于 RNA-seq 数据的自适应标记方法，用于区分低表达和高表达基因。

BMC Bioinformatics. 2014 Mar 31;15:92. doi: 10.1186/1471-2105-15-92.

Transcript Profiling Using Long-Read Sequencing Technologies.使用长读长测序技术进行转录本分析

Methods Mol Biol. 2018;1783:121-147. doi: 10.1007/978-1-4939-7834-2_6.

Quantitative transcriptome analysis using RNA-seq.使用RNA测序进行定量转录组分析。

Methods Mol Biol. 2014;1158:71-91. doi: 10.1007/978-1-4939-0700-7_5.

A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.一种用于异质组织中 RNA-seq 表达解卷积的混合模型。

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S11. doi: 10.1186/1471-2105-14-S5-S11. Epub 2013 Apr 10.

MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.MITIE：在多个样本中基于 RNA-Seq 的同时转录本鉴定和定量。

Bioinformatics. 2013 Oct 15;29(20):2529-38. doi: 10.1093/bioinformatics/btt442. Epub 2013 Aug 25.

引用本文的文献

SVM-DO: identification of tumor-discriminating mRNA signatures via support vector machines supported by Disease Ontology.SVM-DO：通过由疾病本体论支持的支持向量机识别肿瘤鉴别mRNA特征

Turk J Biol. 2023 Dec 14;47(6):349-365. doi: 10.55730/1300-0152.2670. eCollection 2023.

Exploring biomarkers for prognosis and neoadjuvant chemosensitivity in rectal cancer: Multi-omics and ctDNA sequencing collaboration.探索直肠癌预后和新辅助化疗敏感性的生物标志物：多组学和 ctDNA 测序合作。

Front Immunol. 2022 Dec 9;13:1013828. doi: 10.3389/fimmu.2022.1013828. eCollection 2022.

Reprisal of to Mn stress and exploration of its defense mechanism through transcriptomic analysis.对锰胁迫的响应及其防御机制的转录组学分析探索。（原句表述似乎不太准确，推测可能是“Response to Mn stress and exploration of its defense mechanism through transcriptomic analysis.” 这样更符合正常表达习惯）

Front Plant Sci. 2022 Oct 6;13:1022686. doi: 10.3389/fpls.2022.1022686. eCollection 2022.

Genome-Wide Analysis of Invertase Gene Family, and Expression Profiling under Abiotic Stress Conditions in Potato.马铃薯转化酶基因家族的全基因组分析及非生物胁迫条件下的表达谱分析

Biology (Basel). 2022 Mar 31;11(4):539. doi: 10.3390/biology11040539.

Global analysis of switchgrass (Panicum virgatum L.) transcriptomes in response to interactive effects of drought and heat stresses.全球分析柳枝稷（Panicum virgatum L.）转录组对干旱和热胁迫互作的响应。

BMC Plant Biol. 2022 Mar 8;22(1):107. doi: 10.1186/s12870-022-03477-0.

Cholecystokinin-like peptide mediates satiety by inhibiting sugar attraction.胆囊收缩素样肽通过抑制糖吸引力来介导饱腹感。

PLoS Genet. 2021 Aug 16;17(8):e1009724. doi: 10.1371/journal.pgen.1009724. eCollection 2021 Aug.

Genome-wide analysis of epigenetic and transcriptional changes associated with heterosis in pigeonpea.基因组范围分析与雀稗杂种优势相关的表观遗传和转录变化。

Plant Biotechnol J. 2020 Aug;18(8):1697-1710. doi: 10.1111/pbi.13333. Epub 2020 Feb 3.

Identification of Prognostic Candidate Genes in Breast Cancer by Integrated Bioinformatic Analysis.通过综合生物信息学分析鉴定乳腺癌的预后候选基因

J Clin Med. 2019 Aug 2;8(8):1160. doi: 10.3390/jcm8081160.

Integral bHLH factor regulation of cell cycle exit and RGC differentiation.细胞周期退出和视网膜神经节细胞分化的整合bHLH因子调控

Dev Dyn. 2018 Aug;247(8):965-975. doi: 10.1002/dvdy.24638. Epub 2018 Jun 26.

Novel Molecular Markers for Breast Cancer.乳腺癌的新型分子标志物

Biomark Cancer. 2016 Mar 13;8:25-42. doi: 10.4137/BIC.S38394. eCollection 2016.

本文引用的文献

FastUniq: a fast de novo duplicates removal tool for paired short reads.FastUniq：一种用于配对短读长的快速从头去重工具。

PLoS One. 2012;7(12):e52249. doi: 10.1371/journal.pone.0052249. Epub 2012 Dec 20.

STAR: ultrafast universal RNA-seq aligner.STAR：超快通用 RNA-seq 对齐工具。

Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis.Illumina 高通量 RNA 测序数据分析中标准化方法的综合评估。

Brief Bioinform. 2013 Nov;14(6):671-83. doi: 10.1093/bib/bbs046. Epub 2012 Sep 17.

Systematic comparison of RNA-Seq normalization methods using measurement error models.基于测量误差模型的 RNA-Seq 归一化方法的系统比较。

Bioinformatics. 2012 Oct 15;28(20):2584-91. doi: 10.1093/bioinformatics/bts497. Epub 2012 Aug 22.

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.RNA-seq 实验中使用 TopHat 和 Cufflinks 的差异基因和转录本表达分析。

Nat Protoc. 2012 Mar 1;7(3):562-78. doi: 10.1038/nprot.2012.016.

Summarizing and correcting the GC content bias in high-throughput sequencing.高通量测序中 GC 含量偏倚的总结与校正。

Nucleic Acids Res. 2012 May;40(10):e72. doi: 10.1093/nar/gks001. Epub 2012 Feb 9.

Removing technical variability in RNA-seq data using conditional quantile normalization.使用条件分位数归一化去除 RNA-seq 数据中的技术变异性。

Biostatistics. 2012 Apr;13(2):204-16. doi: 10.1093/biostatistics/kxr054. Epub 2012 Jan 27.

A new approach to bias correction in RNA-Seq.一种 RNA-Seq 中偏倚校正的新方法。

Bioinformatics. 2012 Apr 1;28(7):921-8. doi: 10.1093/bioinformatics/bts055. Epub 2012 Jan 28.

GC-content normalization for RNA-Seq data.RNA-Seq 数据的 GC 含量归一化。

BMC Bioinformatics. 2011 Dec 17;12:480. doi: 10.1186/1471-2105-12-480.

Bias detection and correction in RNA-Sequencing data.RNA 测序数据中的偏差检测和校正。

BMC Bioinformatics. 2011 Jul 19;12:290. doi: 10.1186/1471-2105-12-290.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种标准化RNA测序数据的综合方法。

An integrative method to normalize RNA-Seq data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献