• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RNA-Seq 数据的 GC 含量归一化。

GC-content normalization for RNA-Seq data.

机构信息

Division of Biostatistics and Department of Statistics, University of California, Berkeley, USA.

出版信息

BMC Bioinformatics. 2011 Dec 17;12:480. doi: 10.1186/1471-2105-12-480.

DOI:10.1186/1471-2105-12-480
PMID:22177264
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3315510/
Abstract

BACKGROUND

Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.

RESULTS

We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq.

CONCLUSIONS

Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.

摘要

背景

转录组测序(RNA-Seq)已成为高通量基因表达研究的首选检测方法。然而,与微阵列一样,主要的技术相关伪影和偏差会影响到最终的表达测量结果。因此,为了确保对表达水平进行准确推断以及对后续分析,必须进行标准化。

结果

我们专注于与 GC 含量相关的偏差,并证明在 RNA-Seq 读段计数上存在强烈的样本特异性 GC 含量效应,这可能会极大地影响差异表达分析。我们提出了三种简单的基于基因的 lane 内 GC 含量标准化方法,并在两个涉及不同物种和实验设计的不同 RNA-Seq 数据集上评估了它们的性能。我们的方法在表达倍数变化估计的偏差和均方误差方面,以及在差异表达检验的 Type I 错误和 p 值分布方面,与最先进的标准化程序进行了比较。本文中提出的探索性数据分析和标准化方法在开源 Bioconductor R 包 EDASeq 中实现。

结论

我们的 lane 内标准化程序,再加上 lane 间标准化,减少了 GC 含量偏差,从而更准确地估计了表达倍数变化和差异表达检验。这些结果对于 RNA-Seq 实验的生物学解释至关重要,因为下游分析可能对提供的基因列表敏感。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/94ebae2cc162/1471-2105-12-480-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/9702c9ff9df0/1471-2105-12-480-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/a0ceaffc2251/1471-2105-12-480-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/d78c39e35ef0/1471-2105-12-480-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/dc82604ab565/1471-2105-12-480-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/8c36e68cc75c/1471-2105-12-480-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/48f26d135908/1471-2105-12-480-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/94ebae2cc162/1471-2105-12-480-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/9702c9ff9df0/1471-2105-12-480-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/a0ceaffc2251/1471-2105-12-480-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/d78c39e35ef0/1471-2105-12-480-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/dc82604ab565/1471-2105-12-480-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/8c36e68cc75c/1471-2105-12-480-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/48f26d135908/1471-2105-12-480-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb8c/3315510/94ebae2cc162/1471-2105-12-480-7.jpg

相似文献

1
GC-content normalization for RNA-Seq data.RNA-Seq 数据的 GC 含量归一化。
BMC Bioinformatics. 2011 Dec 17;12:480. doi: 10.1186/1471-2105-12-480.
2
Differential Expression Analysis in RNA-Seq by a Naive Bayes Classifier with Local Normalization.基于朴素贝叶斯分类器与局部归一化的RNA测序差异表达分析
Biomed Res Int. 2015;2015:789516. doi: 10.1155/2015/789516. Epub 2015 Aug 3.
3
Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects.ATAC-seq 数据集的归一化基准表明,考虑 GC 含量效应的重要性。
Cell Rep Methods. 2022 Nov 1;2(11):100321. doi: 10.1016/j.crmeth.2022.100321. eCollection 2022 Nov 21.
4
deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies.deGPS是一种用于在RNA测序研究中检测差异表达的强大工具。
BMC Genomics. 2015 Jun 13;16(1):455. doi: 10.1186/s12864-015-1676-0.
5
Bias detection and correction in RNA-Sequencing data.RNA 测序数据中的偏差检测和校正。
BMC Bioinformatics. 2011 Jul 19;12:290. doi: 10.1186/1471-2105-12-290.
6
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.mRNA-Seq 实验中标准化和差异表达的统计方法评估。
BMC Bioinformatics. 2010 Feb 18;11:94. doi: 10.1186/1471-2105-11-94.
7
Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias.由于样本特异性基因长度偏差导致 RNA-seq 数据的功能解读反复出错。
PLoS Biol. 2019 Nov 12;17(11):e3000481. doi: 10.1371/journal.pbio.3000481. eCollection 2019 Nov.
8
Bias and Correction in RNA-seq Data for Marine Species.海洋物种 RNA-seq 数据中的偏差与校正。
Mar Biotechnol (NY). 2017 Oct;19(5):541-550. doi: 10.1007/s10126-017-9773-5. Epub 2017 Sep 7.
9
Normalization of Single-Cell RNA-Seq Data.单细胞 RNA-Seq 数据的归一化处理。
Methods Mol Biol. 2021;2284:303-329. doi: 10.1007/978-1-0716-1307-8_17.
10
An integrative method to normalize RNA-Seq data.一种标准化RNA测序数据的综合方法。
BMC Bioinformatics. 2014 Jun 14;15:188. doi: 10.1186/1471-2105-15-188.

引用本文的文献

1
A multivariate cell-based liquid biopsy for lung nodule risk stratification: Analytical validation and early clinical evaluation.一种用于肺结节风险分层的基于多变量细胞的液体活检:分析验证和早期临床评估。
J Liq Biopsy. 2025 Jul 26;9:100313. doi: 10.1016/j.jlb.2025.100313. eCollection 2025 Sep.
2
Transposable element expression and sub-cellular dynamics during hPSC differentiation to endoderm, mesoderm, and ectoderm lineages.人多能干细胞分化为内胚层、中胚层和外胚层谱系过程中的转座元件表达及亚细胞动力学
Nat Commun. 2025 Aug 18;16(1):7670. doi: 10.1038/s41467-025-63080-3.
3
Integration of Bulk RNA-seq Pipeline Metrics for Assessing Low-Quality Samples.

本文引用的文献

1
Removing technical variability in RNA-seq data using conditional quantile normalization.使用条件分位数归一化去除 RNA-seq 数据中的技术变异性。
Biostatistics. 2012 Apr;13(2):204-16. doi: 10.1093/biostatistics/kxr054. Epub 2012 Jan 27.
2
Synthetic spike-in standards for RNA-seq experiments.用于 RNA-seq 实验的合成 Spike-in 标准品。
Genome Res. 2011 Sep;21(9):1543-51. doi: 10.1101/gr.121095.111. Epub 2011 Aug 4.
3
Bias detection and correction in RNA-Sequencing data.RNA 测序数据中的偏差检测和校正。
整合批量RNA测序流程指标以评估低质量样本
Res Sq. 2025 Jul 3:rs.3.rs-6976695. doi: 10.21203/rs.3.rs-6976695/v1.
4
Spatial transcriptome analysis of myenteric plexus and intestinal epithelium of colon in patients with Parkinson's disease.帕金森病患者结肠肌间神经丛和肠上皮的空间转录组分析
Acta Neuropathol Commun. 2025 Jul 5;13(1):146. doi: 10.1186/s40478-025-02047-3.
5
Loss of endothelial ZEB2 in mice attenuates steatosis early during metabolic dysfunction-associated steatotic liver disease.小鼠体内内皮细胞中ZEB2的缺失可在代谢功能障碍相关脂肪性肝病早期减轻脂肪变性。
Sci Rep. 2025 Jul 2;15(1):23434. doi: 10.1038/s41598-025-05881-6.
6
Abundant Parent-of-origin Effect eQTL: The Framingham Heart Study.丰富的亲本来源效应表达数量性状基因座:弗雷明汉心脏研究。
bioRxiv. 2025 Jun 4:2024.06.05.597677. doi: 10.1101/2024.06.05.597677.
7
Genetic effects on chromatin accessibility uncover mechanisms of liver gene regulation and quantitative traits.遗传对染色质可及性的影响揭示了肝脏基因调控和数量性状的机制。
Genome Res. 2025 Jun 9. doi: 10.1101/gr.279741.124.
8
Model-to-crop conserved NUE Regulons enhance machine learning predictions of nitrogen use efficiency.模型到作物保守的氮利用效率调控子增强了机器学习对氮利用效率的预测。
Plant Cell. 2025 May 9;37(5). doi: 10.1093/plcell/koaf093.
9
5G-exposed human skin cells do not respond with altered gene expression and methylation profiles.暴露于5G环境下的人体皮肤细胞在基因表达和甲基化谱方面没有出现变化。
PNAS Nexus. 2025 May 13;4(5):pgaf127. doi: 10.1093/pnasnexus/pgaf127. eCollection 2025 May.
10
Utilizing Nanopore direct RNA sequencing of blood from patients with sepsis for discovery of co- and post-transcriptional disease biomarkers.利用脓毒症患者血液的纳米孔直接RNA测序来发现共转录和转录后疾病生物标志物。
BMC Infect Dis. 2025 May 13;25(1):692. doi: 10.1186/s12879-025-11078-z.
BMC Bioinformatics. 2011 Jul 19;12:290. doi: 10.1186/1471-2105-12-290.
4
Improving RNA-Seq expression estimates by correcting for fragment bias.通过纠正片段偏倚来提高 RNA-Seq 表达估计。
Genome Biol. 2011;12(3):R22. doi: 10.1186/gb-2011-12-3-r22. Epub 2011 Mar 16.
5
EGO-1, a C. elegans RdRP, modulates gene expression via production of mRNA-templated short antisense RNAs.EGO-1,一种 C. elegans RdRP,通过产生基于 mRNA 模板的短反义 RNA 来调节基因表达。
Curr Biol. 2011 Mar 22;21(6):449-59. doi: 10.1016/j.cub.2011.02.019.
6
Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads.Rnnotator:一种从 RNA-Seq 测序reads 中自动进行从头转录组组装的流水线。
BMC Genomics. 2010 Nov 24;11:663. doi: 10.1186/1471-2164-11-663.
7
Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization.使用 GC 含量归一化控制深度测序数据中拷贝数改变的无偏调用。
Bioinformatics. 2011 Jan 15;27(2):268-9. doi: 10.1093/bioinformatics/btq635. Epub 2010 Nov 15.
8
Differential expression analysis for sequence count data.差异表达分析序列计数数据。
Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.
9
Modeling non-uniformity in short-read rates in RNA-Seq data.RNA-Seq 数据中短读率非均匀性建模。
Genome Biol. 2010;11(5):R50. doi: 10.1186/gb-2010-11-5-r50. Epub 2010 May 11.
10
Biases in Illumina transcriptome sequencing caused by random hexamer priming.Illumina 转录组测序中随机六聚体引物引起的偏倚。
Nucleic Acids Res. 2010 Jul;38(12):e131. doi: 10.1093/nar/gkq224. Epub 2010 Apr 14.