• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RNA 测序数据中的偏差检测和校正。

Bias detection and correction in RNA-Sequencing data.

机构信息

Biostatistics Resource, Keck Laboratory, Yale University, 300 George Street, New Haven, Connecticut, 06510, USA.

出版信息

BMC Bioinformatics. 2011 Jul 19;12:290. doi: 10.1186/1471-2105-12-290.

DOI:10.1186/1471-2105-12-290
PMID:21771300
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3149584/
Abstract

BACKGROUND

High throughput sequencing technology provides us unprecedented opportunities to study transcriptome dynamics. Compared to microarray-based gene expression profiling, RNA-Seq has many advantages, such as high resolution, low background, and ability to identify novel transcripts. Moreover, for genes with multiple isoforms, expression of each isoform may be estimated from RNA-Seq data. Despite these advantages, recent work revealed that base level read counts from RNA-Seq data may not be randomly distributed and can be affected by local nucleotide composition. It was not clear though how the base level read count bias may affect gene level expression estimates.

RESULTS

In this paper, by using five published RNA-Seq data sets from different biological sources and with different data preprocessing schemes, we showed that commonly used estimates of gene expression levels from RNA-Seq data, such as reads per kilobase of gene length per million reads (RPKM), are biased in terms of gene length, GC content and dinucleotide frequencies. We directly examined the biases at the gene-level, and proposed a simple generalized-additive-model based approach to correct different sources of biases simultaneously. Compared to previously proposed base level correction methods, our method reduces bias in gene-level expression estimates more effectively.

CONCLUSIONS

Our method identifies and corrects different sources of biases in gene-level expression measures from RNA-Seq data, and provides more accurate estimates of gene expression levels from RNA-Seq. This method should prove useful in meta-analysis of gene expression levels using different platforms or experimental protocols.

摘要

背景

高通量测序技术为我们研究转录组动态提供了前所未有的机会。与基于微阵列的基因表达谱分析相比,RNA-Seq 具有许多优势,例如高分辨率、低背景和识别新转录本的能力。此外,对于具有多个异构体的基因,可以从 RNA-Seq 数据估计每个异构体的表达。尽管有这些优势,但最近的工作表明,RNA-Seq 数据的碱基水平读数计数可能不是随机分布的,并且可能受到局部核苷酸组成的影响。不过,碱基水平读数计数偏差如何影响基因水平表达估计还不清楚。

结果

在本文中,我们使用来自不同生物来源和不同数据预处理方案的五个已发表的 RNA-Seq 数据集,表明从 RNA-Seq 数据中常用的基因表达水平估计值,例如每百万读碱基的每千碱基基因长度的读数(RPKM),在基因长度、GC 含量和二核苷酸频率方面存在偏差。我们直接在基因水平上检查了偏差,并提出了一种简单的基于广义加性模型的方法来同时校正不同的偏差源。与之前提出的碱基水平校正方法相比,我们的方法更有效地减少了基因水平表达估计中的偏差。

结论

我们的方法识别并校正了 RNA-Seq 数据中基因水平表达测量中的不同偏差源,并提供了更准确的 RNA-Seq 基因表达水平估计值。该方法在使用不同平台或实验方案进行基因表达水平的荟萃分析时应该很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8277/3149584/3861e9c57c6c/1471-2105-12-290-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8277/3149584/89ab79ab69c9/1471-2105-12-290-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8277/3149584/2f9cc586f570/1471-2105-12-290-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8277/3149584/d77a5962d941/1471-2105-12-290-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8277/3149584/3861e9c57c6c/1471-2105-12-290-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8277/3149584/89ab79ab69c9/1471-2105-12-290-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8277/3149584/2f9cc586f570/1471-2105-12-290-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8277/3149584/d77a5962d941/1471-2105-12-290-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8277/3149584/3861e9c57c6c/1471-2105-12-290-4.jpg

相似文献

1
Bias detection and correction in RNA-Sequencing data.RNA 测序数据中的偏差检测和校正。
BMC Bioinformatics. 2011 Jul 19;12:290. doi: 10.1186/1471-2105-12-290.
2
Bias and Correction in RNA-seq Data for Marine Species.海洋物种 RNA-seq 数据中的偏差与校正。
Mar Biotechnol (NY). 2017 Oct;19(5):541-550. doi: 10.1007/s10126-017-9773-5. Epub 2017 Sep 7.
3
Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.通过对异构体和外显子特异性读段测序率进行建模来改进RNA测序表达估计。
BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.
4
Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.从偏向性 RNA-Seq 读段进行转录组组装和异构体表达水平估计。
Bioinformatics. 2012 Nov 15;28(22):2914-21. doi: 10.1093/bioinformatics/bts559. Epub 2012 Oct 11.
5
Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols.在比较不同样本和测序方案时,滥用 RPKM 或 TPM 标准化。
RNA. 2020 Aug;26(8):903-909. doi: 10.1261/rna.074922.120. Epub 2020 Apr 13.
6
An integrative method to normalize RNA-Seq data.一种标准化RNA测序数据的综合方法。
BMC Bioinformatics. 2014 Jun 14;15:188. doi: 10.1186/1471-2105-15-188.
7
Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq.使用非均匀读分布模型提高 RNA-Seq 中异构体表达推断。
Bioinformatics. 2011 Feb 15;27(4):502-8. doi: 10.1093/bioinformatics/btq696. Epub 2010 Dec 17.
8
GC-content normalization for RNA-Seq data.RNA-Seq 数据的 GC 含量归一化。
BMC Bioinformatics. 2011 Dec 17;12:480. doi: 10.1186/1471-2105-12-480.
9
Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching.定量 RNA-seq 的盲点:评估丰度、差异表达和异构体转换的限制。
BMC Bioinformatics. 2013 Dec 24;14:370. doi: 10.1186/1471-2105-14-370.
10
WemIQ: an accurate and robust isoform quantification method for RNA-seq data.WemIQ:一种用于RNA测序数据的准确且稳健的异构体定量方法。
Bioinformatics. 2015 Mar 15;31(6):878-85. doi: 10.1093/bioinformatics/btu757. Epub 2014 Nov 17.

引用本文的文献

1
Preservation of milk in liquid nitrogen during sample collection does not affect the RNA quality for RNA-seq analysis.在样本采集过程中将牛奶保存在液氮中不会影响用于RNA测序分析的RNA质量。
BMC Genomics. 2025 May 24;26(1):525. doi: 10.1186/s12864-025-11707-6.
2
Reliable RNA-seq analysis from FFPE specimens as a means to accelerate cancer-related health disparities research.从福尔马林固定石蜡包埋(FFPE)样本中进行可靠的RNA测序分析,作为加速癌症相关健康差异研究的一种手段。
PLoS One. 2025 Apr 21;20(4):e0321631. doi: 10.1371/journal.pone.0321631. eCollection 2025.
3
Machine learning-optimized targeted detection of alternative splicing.

本文引用的文献

1
Improving RNA-Seq expression estimates by correcting for fragment bias.通过纠正片段偏倚来提高 RNA-Seq 表达估计。
Genome Biol. 2011;12(3):R22. doi: 10.1186/gb-2011-12-3-r22. Epub 2011 Mar 16.
2
Length bias correction for RNA-seq data in gene set analyses.基因集分析中 RNA-seq 数据的长度偏差校正。
Bioinformatics. 2011 Mar 1;27(5):662-9. doi: 10.1093/bioinformatics/btr005. Epub 2011 Jan 19.
3
SAMStat: monitoring biases in next generation sequencing data.SAMStat:监测下一代测序数据中的偏倚。
机器学习优化的可变剪接靶向检测
Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkae1260.
4
Machine learning-optimized targeted detection of alternative splicing.机器学习优化的选择性剪接靶向检测
bioRxiv. 2024 Sep 24:2024.09.20.614162. doi: 10.1101/2024.09.20.614162.
5
Cortexa: a comprehensive resource for studying gene expression and alternative splicing in the murine brain.皮质数据库:一个用于研究小鼠大脑中基因表达和选择性剪接的综合资源。
BMC Bioinformatics. 2024 Sep 5;25(1):293. doi: 10.1186/s12859-024-05919-y.
6
Comparative Analysis of Single-Cell RNA Sequencing Methods with and without Sample Multiplexing.单细胞 RNA 测序方法在有和没有样品多路复用情况下的比较分析。
Int J Mol Sci. 2024 Mar 29;25(7):3828. doi: 10.3390/ijms25073828.
7
Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single-cell RNA-sequencing datasets.使用单细胞 RNA 测序数据集对具有不同细胞大小的异质组织进行计算去卷积所面临的挑战和机遇。
Genome Biol. 2023 Dec 14;24(1):288. doi: 10.1186/s13059-023-03123-4.
8
How can early life adversity still exert an effect decades later? A question of timing, tissues and mechanisms.早年逆境为何数十年后仍有影响?这是一个时机、组织和机制的问题。
Front Immunol. 2023 Jun 30;14:1215544. doi: 10.3389/fimmu.2023.1215544. eCollection 2023.
9
Artifacts and biases of the reverse transcription reaction in RNA sequencing.RNA 测序中反转录反应的假象和偏差。
RNA. 2023 Jul;29(7):889-897. doi: 10.1261/rna.079623.123. Epub 2023 Mar 29.
10
The hitchhikers' guide to RNA sequencing and functional analysis.RNA 测序和功能分析的搭便车指南。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac529.
Bioinformatics. 2011 Jan 1;27(1):130-1. doi: 10.1093/bioinformatics/btq614. Epub 2010 Nov 18.
4
Modeling non-uniformity in short-read rates in RNA-Seq data.RNA-Seq 数据中短读率非均匀性建模。
Genome Biol. 2010;11(5):R50. doi: 10.1186/gb-2010-11-5-r50. Epub 2010 May 11.
5
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.通过 RNA-Seq 进行转录本组装和定量分析揭示了细胞分化过程中未注释的转录本和异构体转换。
Nat Biotechnol. 2010 May;28(5):511-5. doi: 10.1038/nbt.1621. Epub 2010 May 2.
6
Biases in Illumina transcriptome sequencing caused by random hexamer priming.Illumina 转录组测序中随机六聚体引物引起的偏倚。
Nucleic Acids Res. 2010 Jul;38(12):e131. doi: 10.1093/nar/gkq224. Epub 2010 Apr 14.
7
Understanding mechanisms underlying human gene expression variation with RNA sequencing.利用 RNA 测序理解人类基因表达变异的机制。
Nature. 2010 Apr 1;464(7289):768-72. doi: 10.1038/nature08872. Epub 2010 Mar 10.
8
A scaling normalization method for differential expression analysis of RNA-seq data.RNA-seq 数据差异表达分析的缩放标准化方法。
Genome Biol. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. Epub 2010 Mar 2.
9
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.mRNA-Seq 实验中标准化和差异表达的统计方法评估。
BMC Bioinformatics. 2010 Feb 18;11:94. doi: 10.1186/1471-2105-11-94.
10
FRT-seq: amplification-free, strand-specific transcriptome sequencing.FRT-seq:无扩增、链特异性转录组测序。
Nat Methods. 2010 Feb;7(2):130-2. doi: 10.1038/nmeth.1417. Epub 2010 Jan 17.