• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用条件分位数归一化去除 RNA-seq 数据中的技术变异性。

Removing technical variability in RNA-seq data using conditional quantile normalization.

机构信息

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.

出版信息

Biostatistics. 2012 Apr;13(2):204-16. doi: 10.1093/biostatistics/kxr054. Epub 2012 Jan 27.

DOI:10.1093/biostatistics/kxr054
PMID:22285995
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3297825/
Abstract

The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade's worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show that RNA-seq data demonstrate unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find guanine-cytosine content (GC-content) has a strong sample-specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here, we describe a statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content and quantile normalization to correct for global distortions.

摘要

在全基因组范围内测量基因表达的能力是分子生物学最有前途的成就之一。微阵列技术是最初实现这一目标的技术,但由于存在不必要的变异源,存在许多问题。经过十年的统计方法学发展,许多这些问题现在得到了缓解。最近开发的 RNA 测序 (RNA-seq) 技术在部分由于与微阵列相比声称减少了变异性而引起了极大的关注。然而,我们表明,RNA-seq 数据显示出与最初在微阵列中观察到的类似的不需要的和掩盖的可变性。具体而言,我们发现鸟嘌呤-胞嘧啶含量 (GC 含量) 对基因表达测量具有强烈的样本特异性影响,如果不进行校正,会导致下游结果中的假阳性。我们还报告了常见的观察到的数据扭曲,这些扭曲表明需要数据归一化。在这里,我们描述了一种统计方法学,该方法在不损失准确性的情况下将精度提高了 42%。我们的条件分位数归一化算法结合了稳健的广义回归来消除 GC 含量等确定性特征引入的系统偏差,以及分位数归一化来纠正全局扭曲。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970e/3297825/c26fa3dfb889/biostskxr054f04_ht.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970e/3297825/4cb6c1a91635/biostskxr054f01_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970e/3297825/865aa34b200f/biostskxr054f02_ht.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970e/3297825/3dbff0996fec/biostskxr054f03_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970e/3297825/c26fa3dfb889/biostskxr054f04_ht.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970e/3297825/4cb6c1a91635/biostskxr054f01_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970e/3297825/865aa34b200f/biostskxr054f02_ht.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970e/3297825/3dbff0996fec/biostskxr054f03_3c.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970e/3297825/c26fa3dfb889/biostskxr054f04_ht.jpg

相似文献

1
Removing technical variability in RNA-seq data using conditional quantile normalization.使用条件分位数归一化去除 RNA-seq 数据中的技术变异性。
Biostatistics. 2012 Apr;13(2):204-16. doi: 10.1093/biostatistics/kxr054. Epub 2012 Jan 27.
2
How does normalization impact RNA-seq disease diagnosis?归一化如何影响 RNA-seq 疾病诊断?
J Biomed Inform. 2018 Sep;85:80-92. doi: 10.1016/j.jbi.2018.07.016. Epub 2018 Jul 21.
3
Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.从假设的角度选择样本间 RNA-Seq 标准化方法。
Brief Bioinform. 2018 Sep 28;19(5):776-792. doi: 10.1093/bib/bbx008.
4
Smooth quantile normalization.平滑分位数归一化
Biostatistics. 2018 Apr 1;19(2):185-198. doi: 10.1093/biostatistics/kxx028.
5
A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data.一种新的用于散布的收缩估计量可改善 RNA-seq 数据中的差异表达检测。
Biostatistics. 2013 Apr;14(2):232-43. doi: 10.1093/biostatistics/kxs033. Epub 2012 Sep 22.
6
Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.RNA测序数据差异基因表达分析方法的综合评估
Genome Biol. 2013;14(9):R95. doi: 10.1186/gb-2013-14-9-r95.
7
mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-Seq studies.mRNA 富集方案决定了 RNA-Seq 研究中外源 RNA Spike-in 对照品定量的特点。
Sci China Life Sci. 2013 Feb;56(2):134-42. doi: 10.1007/s11427-013-4437-9. Epub 2013 Feb 8.
8
Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data.调整 RNA 测序数据中基因表达测量的虚假相关性。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad610.
9
DETECTION OF BACTERIAL SMALL TRANSCRIPTS FROM RNA-SEQ DATA: A COMPARATIVE ASSESSMENT.从RNA测序数据中检测细菌小转录本:一项比较评估
Pac Symp Biocomput. 2016;21:456-67.
10
GC-content normalization for RNA-Seq data.RNA-Seq 数据的 GC 含量归一化。
BMC Bioinformatics. 2011 Dec 17;12:480. doi: 10.1186/1471-2105-12-480.

引用本文的文献

1
A multivariate cell-based liquid biopsy for lung nodule risk stratification: Analytical validation and early clinical evaluation.一种用于肺结节风险分层的基于多变量细胞的液体活检:分析验证和早期临床评估。
J Liq Biopsy. 2025 Jul 26;9:100313. doi: 10.1016/j.jlb.2025.100313. eCollection 2025 Sep.
2
Brain transcriptomics highlight abundant gene expression and splicing alterations in non-neuronal cells in aFTLD-U.脑转录组学揭示了进行性核上性麻痹伴额颞叶痴呆(aFTLD-U)中非神经元细胞中丰富的基因表达和剪接改变。
Acta Neuropathol. 2025 Aug 10;150(1):17. doi: 10.1007/s00401-025-02919-x.
3
Convergent Molecular Evolution Associated With Repeated Transitions to Gregarious Larval Behavior in Heliconiini.

本文引用的文献

1
Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.针对生物变异的多因素 RNA-Seq 实验的差异表达分析。
Nucleic Acids Res. 2012 May;40(10):4288-97. doi: 10.1093/nar/gks042. Epub 2012 Jan 28.
2
Sequencing technology does not eliminate biological variability.测序技术并不能消除生物变异性。
Nat Biotechnol. 2011 Jul 11;29(7):572-3. doi: 10.1038/nbt.1910.
3
Analysis of HIV-1 expression level and sense of transcription by high-throughput sequencing of the infected cell.
与赫利孔亚族幼虫群居行为的反复转变相关的趋同分子进化。
Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf179.
4
Interpretable deep learning framework for understanding molecular changes in human brains with Alzheimer's disease: implications for microglia activation and sex differences.用于理解阿尔茨海默病患者大脑分子变化的可解释深度学习框架:对小胶质细胞激活和性别差异的启示
NPJ Aging. 2025 Jul 16;11(1):66. doi: 10.1038/s41514-025-00258-5.
5
Normalization and Selecting Non-Differentially Expressed Genes Improve Machine Learning Modelling of Cross-Platform Transcriptomic Data.归一化和选择非差异表达基因可改善跨平台转录组数据的机器学习建模
Trans Artif Intell. 2025;1(1). doi: 10.53941/tai.2025.100005. Epub 2025 May 25.
6
Differing Genetics of Saline and Cocaine Self-Administration in the Hybrid Mouse Diversity Panel.杂交小鼠多样性面板中生理盐水和可卡因自我给药的不同遗传学
Genes Brain Behav. 2025 Jun;24(3):e70029. doi: 10.1111/gbb.70029.
7
Developing a disease-specific accessible transcriptional signature as a biomarker for ataxia with oculomotor apraxia type 2.开发一种疾病特异性的可及转录特征作为2型动眼神经失用性共济失调的生物标志物。
Mol Med. 2025 May 24;31(1):205. doi: 10.1186/s10020-025-01257-8.
8
5G-exposed human skin cells do not respond with altered gene expression and methylation profiles.暴露于5G环境下的人体皮肤细胞在基因表达和甲基化谱方面没有出现变化。
PNAS Nexus. 2025 May 13;4(5):pgaf127. doi: 10.1093/pnasnexus/pgaf127. eCollection 2025 May.
9
Characterizing the expression profile of 3R tau pathology in Pick's disease.表征皮克病中3R tau病理的表达谱。
Sci Adv. 2025 May 2;11(18):eadt6105. doi: 10.1126/sciadv.adt6105.
10
Network-Based Integrative Analysis to Identify Key Genes and Corresponding Reporter Biomolecules for Triple-Negative Breast Cancer.基于网络的综合分析以鉴定三阴性乳腺癌的关键基因及相应的报告生物分子
Cancer Med. 2025 May;14(9):e70674. doi: 10.1002/cam4.70674.
高通量测序分析感染细胞中 HIV-1 的表达水平和转录本的感测。
J Virol. 2011 Jul;85(13):6205-11. doi: 10.1128/JVI.00252-11. Epub 2011 Apr 20.
4
Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays.使用 RNA-Seq 和微阵列评估 C57BL/6J 和 DBA/2J 小鼠纹状体中的基因表达。
PLoS One. 2011 Mar 24;6(3):e17820. doi: 10.1371/journal.pone.0017820.
5
Improving RNA-Seq expression estimates by correcting for fragment bias.通过纠正片段偏倚来提高 RNA-Seq 表达估计。
Genome Biol. 2011;12(3):R22. doi: 10.1186/gb-2011-12-3-r22. Epub 2011 Mar 16.
6
A genome-wide study of DNA methylation patterns and gene expression levels in multiple human and chimpanzee tissues.在多个人类和黑猩猩组织中进行的全基因组 DNA 甲基化模式和基因表达水平的研究。
PLoS Genet. 2011 Feb;7(2):e1001316. doi: 10.1371/journal.pgen.1001316. Epub 2011 Feb 24.
7
Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries.分析并最小化 Illumina 测序文库中的 PCR 扩增偏倚。
Genome Biol. 2011;12(2):R18. doi: 10.1186/gb-2011-12-2-r18. Epub 2011 Feb 21.
8
Gene expression profiling of human breast tissue samples using SAGE-Seq.使用 SAGE-Seq 对人乳腺组织样本进行基因表达谱分析。
Genome Res. 2010 Dec;20(12):1730-9. doi: 10.1101/gr.108217.110. Epub 2010 Nov 2.
9
Ensembl 2011.Ensembl 2011年版
Nucleic Acids Res. 2011 Jan;39(Database issue):D800-6. doi: 10.1093/nar/gkq1064. Epub 2010 Nov 2.
10
Differential expression analysis for sequence count data.差异表达分析序列计数数据。
Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.