• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

校正RNA测序数据中的比例失真。

Correcting scale distortion in RNA sequencing data.

作者信息

Thron Christopher, Jafari Farhad

机构信息

Department of Science and Mathematics, Texas A &M University-Central Texas, Killeen, TX, 76549, USA.

Department of Radiology, University of Minnesota, Minneapolis, MN, 55455, USA.

出版信息

BMC Bioinformatics. 2025 Jan 28;26(1):32. doi: 10.1186/s12859-025-06041-3.

DOI:10.1186/s12859-025-06041-3
PMID:39875825
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11776150/
Abstract

RNA sequencing (RNA-seq) is the conventional genome-scale approach used to capture the expression levels of all detectable genes in a biological sample. This is now regularly used for population-based studies designed to identify genetic determinants of various diseases. Naturally, the accuracy of these tests should be verified and improved if possible. In this study, we aimed to detect and correct for expression level-dependent errors which are not corrected by conventional normalization techniques. We examined several RNA-seq datasets from the Cancer Genome Atlas (TCGA), Stand Up 2 Cancer (SU2C), and GTEx databases with various types of preprocessing. By applying local averaging, we found expression-level dependent biases that differ from sample to sample in all datasets studied. Using simulations, we show that these biases corrupt gene-gene correlation estimations and t tests between subpopulations. To mitigate these biases, we introduce two different nonlinear transforms based on statistical considerations that correct these observed biases. We demonstrate that these transforms effectively remove the observed per-sample biases, reduce sample-to-sample variance, and improve the characteristics of gene-gene correlation distributions. Using a novel simulation methodology that creates controlled differences between subpopulations, we show that these transforms reduce variability and increase sensitivity of two population tests. The improvements in sensitivity and specificity were of the order of 3-5% in most instances after the data was corrected for bias. Altogether, these results improve our capacity to understand gene-gene relationships, and may lead to novel ways to utilize the information derived from clinical tests.

摘要

RNA测序(RNA-seq)是一种传统的基因组规模方法,用于获取生物样本中所有可检测基因的表达水平。目前,它经常用于旨在识别各种疾病遗传决定因素的人群研究。自然而然地,如果可能的话,这些检测的准确性应该得到验证和提高。在本研究中,我们旨在检测并校正传统归一化技术无法校正的与表达水平相关的误差。我们检查了来自癌症基因组图谱(TCGA)、“站起来对抗癌症”(SU2C)和基因型-组织表达(GTEx)数据库的几个经过各种预处理的RNA-seq数据集。通过应用局部平均法,我们在所有研究的数据集中发现了样本间不同的与表达水平相关的偏差。通过模拟,我们表明这些偏差会破坏基因-基因相关性估计以及亚群之间的t检验。为了减轻这些偏差,我们基于统计考虑引入了两种不同的非线性变换,以校正这些观察到的偏差。我们证明这些变换有效地消除了观察到的每个样本的偏差,降低了样本间的方差,并改善了基因-基因相关性分布的特征。使用一种在亚群之间创建可控差异的新型模拟方法,我们表明这些变换降低了变异性并提高了两种群体检验的灵敏度。在对数据进行偏差校正后,大多数情况下灵敏度和特异性的提高幅度约为3%-5%。总之,这些结果提高了我们理解基因-基因关系的能力,并可能带来利用临床检测所得信息的新方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/64c2c4e598a6/12859_2025_6041_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/65e3ee5b0de2/12859_2025_6041_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/fe05d5e41081/12859_2025_6041_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/c64390b98f37/12859_2025_6041_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/0be4a52bfcbc/12859_2025_6041_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/b580b61057e3/12859_2025_6041_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/231fcee6e2e7/12859_2025_6041_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/f010d8bd0b73/12859_2025_6041_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/58acb1b8e661/12859_2025_6041_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/d5f9ece3b998/12859_2025_6041_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/cf6da29800f2/12859_2025_6041_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/b4acee33f202/12859_2025_6041_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/2fa6f95741e0/12859_2025_6041_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/b89db54d98ce/12859_2025_6041_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/b6131288ccd0/12859_2025_6041_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/8f97906c6c8f/12859_2025_6041_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/9590bd088e99/12859_2025_6041_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/64c2c4e598a6/12859_2025_6041_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/65e3ee5b0de2/12859_2025_6041_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/fe05d5e41081/12859_2025_6041_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/c64390b98f37/12859_2025_6041_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/0be4a52bfcbc/12859_2025_6041_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/b580b61057e3/12859_2025_6041_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/231fcee6e2e7/12859_2025_6041_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/f010d8bd0b73/12859_2025_6041_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/58acb1b8e661/12859_2025_6041_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/d5f9ece3b998/12859_2025_6041_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/cf6da29800f2/12859_2025_6041_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/b4acee33f202/12859_2025_6041_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/2fa6f95741e0/12859_2025_6041_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/b89db54d98ce/12859_2025_6041_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/b6131288ccd0/12859_2025_6041_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/8f97906c6c8f/12859_2025_6041_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/9590bd088e99/12859_2025_6041_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ba3/11776150/64c2c4e598a6/12859_2025_6041_Fig17_HTML.jpg

相似文献

1
Correcting scale distortion in RNA sequencing data.校正RNA测序数据中的比例失真。
BMC Bioinformatics. 2025 Jan 28;26(1):32. doi: 10.1186/s12859-025-06041-3.
2
TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository.TPM、FPKM 还是归一化计数?来自 NCI 患者衍生模型资源库的 RNA-seq 数据分析的定量测量方法的比较研究。
J Transl Med. 2021 Jun 22;19(1):269. doi: 10.1186/s12967-021-02936-w.
3
Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols.在比较不同样本和测序方案时,滥用 RPKM 或 TPM 标准化。
RNA. 2020 Aug;26(8):903-909. doi: 10.1261/rna.074922.120. Epub 2020 Apr 13.
4
Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates.混合模型揭示了RNA测序数据中的多种位置偏差类型,并能准确估计转录本浓度。
PLoS Comput Biol. 2017 May 15;13(5):e1005515. doi: 10.1371/journal.pcbi.1005515. eCollection 2017 May.
5
A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.用于RNA测序数据差异表达分析的每个样本全局缩放和每个基因归一化方法的比较。
PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.
6
A benchmark of RNA-seq data normalization methods for transcriptome mapping on human genome-scale metabolic networks.基于人类基因组规模代谢网络的转录组映射的 RNA-seq 数据标准化方法基准测试。
NPJ Syst Biol Appl. 2024 Oct 24;10(1):124. doi: 10.1038/s41540-024-00448-z.
7
Bias detection and correction in RNA-Sequencing data.RNA 测序数据中的偏差检测和校正。
BMC Bioinformatics. 2011 Jul 19;12:290. doi: 10.1186/1471-2105-12-290.
8
Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias.由于样本特异性基因长度偏差导致 RNA-seq 数据的功能解读反复出错。
PLoS Biol. 2019 Nov 12;17(11):e3000481. doi: 10.1371/journal.pbio.3000481. eCollection 2019 Nov.
9
An efficient concordant integrative analysis of multiple large-scale two-sample expression data sets.一种高效的多个大规模两样本表达数据集的一致性综合分析方法。
Bioinformatics. 2017 Dec 1;33(23):3852-3860. doi: 10.1093/bioinformatics/btx061.
10
A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies.比较 RNA-Seq 数据预处理管道,以跨独立研究进行转录组预测。
BMC Bioinformatics. 2024 May 8;25(1):181. doi: 10.1186/s12859-024-05801-x.

引用本文的文献

1
PSD3 as a context-dependent modulator of immune landscape and tumor aggressiveness in esophageal squamous cell carcinoma.PSD3作为食管鳞状细胞癌免疫格局和肿瘤侵袭性的一种上下文依赖性调节因子。
Front Immunol. 2025 Aug 15;16:1641254. doi: 10.3389/fimmu.2025.1641254. eCollection 2025.

本文引用的文献

1
MUREN: a robust and multi-reference approach of RNA-seq transcript normalization.MUREN:一种稳健且支持多参照的 RNA-seq 转录本标准化方法。
BMC Bioinformatics. 2021 Jul 28;22(1):386. doi: 10.1186/s12859-021-04288-0.
2
TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository.TPM、FPKM 还是归一化计数?来自 NCI 患者衍生模型资源库的 RNA-seq 数据分析的定量测量方法的比较研究。
J Transl Med. 2021 Jun 22;19(1):269. doi: 10.1186/s12967-021-02936-w.
3
Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis.
系统比较和评估 RNA-seq 程序进行基因表达定量分析。
Sci Rep. 2020 Nov 12;10(1):19737. doi: 10.1038/s41598-020-76881-x.
4
RNA-binding proteins in tumor progression.肿瘤进展中的 RNA 结合蛋白。
J Hematol Oncol. 2020 Jul 11;13(1):90. doi: 10.1186/s13045-020-00927-w.
5
Normalization Methods for the Analysis of Unbalanced Transcriptome Data: A Review.非平衡转录组数据分析的归一化方法综述
Front Bioeng Biotechnol. 2019 Nov 26;7:358. doi: 10.3389/fbioe.2019.00358. eCollection 2019.
6
Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression.使用正则化负二项式回归进行单细胞 RNA-seq 数据的归一化和方差稳定化。
Genome Biol. 2019 Dec 23;20(1):296. doi: 10.1186/s13059-019-1874-1.
7
Linnorm: improved statistical analysis for single cell RNA-seq expression data.Linnorm:单细胞RNA测序表达数据的改进统计分析
Nucleic Acids Res. 2017 Dec 15;45(22):e179. doi: 10.1093/nar/gkx828.
8
Normalization and microbial differential abundance strategies depend upon data characteristics.归一化和微生物差异丰度策略取决于数据特征。
Microbiome. 2017 Mar 3;5(1):27. doi: 10.1186/s40168-017-0237-y.
9
Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.Scater:R语言中单细胞RNA测序数据的预处理、质量控制、标准化和可视化
Bioinformatics. 2017 Apr 15;33(8):1179-1186. doi: 10.1093/bioinformatics/btw777.
10
How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?RNA测序实验需要多少生物学重复,以及应该使用哪种差异表达工具?
RNA. 2016 Jun;22(6):839-51. doi: 10.1261/rna.053959.115. Epub 2016 Mar 28.