• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

mRNA-Seq 实验中标准化和差异表达的统计方法评估。

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.

机构信息

Division of Biostatistics, University of California, Berkeley, Berkeley, CA, USA.

出版信息

BMC Bioinformatics. 2010 Feb 18;11:94. doi: 10.1186/1471-2105-11-94.

DOI:10.1186/1471-2105-11-94
PMID:20167110
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2838869/
Abstract

BACKGROUND

High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data.

RESULTS

We compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection.

CONCLUSIONS

Our results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq.

摘要

背景

高通量测序技术,如 Illumina 基因组分析仪,是研究广泛的生物和医学问题的强大新工具。统计和计算方法是从测序仪生成的大量复杂数据集得出有意义和准确结论的关键。我们提供了对 Illumina 转录组测序 (mRNA-Seq) 数据进行标准化和差异表达 (DE) 分析的统计方法的详细评估。

结果

我们比较了用于检测两种类型的生物样本之间差异表达基因的统计方法,发现测试统计数据在处理低计数基因方面存在很大差异。我们评估了测序平台的特征,例如基因长度变化、碱基调用校准方法(带和不带 phi X 对照泳道)以及流动池/文库制备效果,对 DE 结果的影响。我们研究了读取计数标准化方法对 DE 结果的影响,并表明通过总泳道计数(例如 RPKM)缩放的标准方法可能会使 DE 的估计产生偏差。我们提出了更通用的基于分位数的标准化程序,并证明了 DE 检测的改进。

结论

我们的结果对 mRNA-Seq 实验的设计和分析具有重要的实际和方法学意义。它们强调了适当的统计方法对于标准化和 DE 推断的重要性,以考虑可能影响结果准确性的测序平台的特征。它们还揭示了在开发用于 mRNA-Seq 的统计和计算方法方面需要进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/e09dee9bf78c/1471-2105-11-94-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/277d60876059/1471-2105-11-94-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/672ee4c3d1dc/1471-2105-11-94-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/848b92a83bac/1471-2105-11-94-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/e529859204cf/1471-2105-11-94-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/2e004c494008/1471-2105-11-94-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/67deb9a390a4/1471-2105-11-94-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/96728c4963c4/1471-2105-11-94-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/e09dee9bf78c/1471-2105-11-94-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/277d60876059/1471-2105-11-94-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/672ee4c3d1dc/1471-2105-11-94-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/848b92a83bac/1471-2105-11-94-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/e529859204cf/1471-2105-11-94-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/2e004c494008/1471-2105-11-94-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/67deb9a390a4/1471-2105-11-94-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/96728c4963c4/1471-2105-11-94-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54b1/2838869/e09dee9bf78c/1471-2105-11-94-8.jpg

相似文献

1
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.mRNA-Seq 实验中标准化和差异表达的统计方法评估。
BMC Bioinformatics. 2010 Feb 18;11:94. doi: 10.1186/1471-2105-11-94.
2
Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.使用来自726只黑腹果蝇个体的RNA测序数据进行标准化和差异表达分析的比较。
BMC Genomics. 2016 Jan 5;17:28. doi: 10.1186/s12864-015-2353-z.
3
deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies.deGPS是一种用于在RNA测序研究中检测差异表达的强大工具。
BMC Genomics. 2015 Jun 13;16(1):455. doi: 10.1186/s12864-015-1676-0.
4
Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.从假设的角度选择样本间 RNA-Seq 标准化方法。
Brief Bioinform. 2018 Sep 28;19(5):776-792. doi: 10.1093/bib/bbx008.
5
Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments.转录组组装和差异基因表达定量中的挑战与策略。RNA-seq 实验的综合计算机评估。
Mol Ecol. 2013 Feb;22(3):620-34. doi: 10.1111/mec.12014. Epub 2012 Sep 24.
6
Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.比较Illumina高通量RNA测序数据差异分析的标准化方法。
BMC Bioinformatics. 2015 Oct 28;16:347. doi: 10.1186/s12859-015-0778-7.
7
The Impact of Normalization Methods on RNA-Seq Data Analysis.标准化方法对RNA测序数据分析的影响。
Biomed Res Int. 2015;2015:621690. doi: 10.1155/2015/621690. Epub 2015 Jun 15.
8
GC-content normalization for RNA-Seq data.RNA-Seq 数据的 GC 含量归一化。
BMC Bioinformatics. 2011 Dec 17;12:480. doi: 10.1186/1471-2105-12-480.
9
An iteration normalization and test method for differential expression analysis of RNA-seq data.一种 RNA-seq 数据差异表达分析的迭代归一化和测试方法。
BioData Min. 2014 Aug 13;7:15. doi: 10.1186/1756-0381-7-15. eCollection 2014.
10
Transcript Profiling Using Long-Read Sequencing Technologies.使用长读长测序技术进行转录本分析
Methods Mol Biol. 2018;1783:121-147. doi: 10.1007/978-1-4939-7834-2_6.

引用本文的文献

1
Microbiome data integration via shared dictionary learning.通过共享字典学习进行微生物组数据整合。
Nat Commun. 2025 Sep 1;16(1):8147. doi: 10.1038/s41467-025-63425-y.
2
Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2.使用ALDEx2对RNA测序计数数据进行分析的显式尺度模拟。
NAR Genom Bioinform. 2025 Aug 19;7(3):lqaf108. doi: 10.1093/nargab/lqaf108. eCollection 2025 Sep.
3
A duplex sequencing approach for high-sensitivity detection of genome-edited plants.一种用于高灵敏度检测基因组编辑植物的双链测序方法。

本文引用的文献

1
Transcript length bias in RNA-seq data confounds systems biology.RNA测序数据中的转录本长度偏差会混淆系统生物学。
Biol Direct. 2009 Apr 16;4:14. doi: 10.1186/1745-6150-4-14.
2
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。
Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.
3
GenomeGraphs: integrated genomic data visualization with R.基因组图谱:使用R进行综合基因组数据可视化
Food Chem (Oxf). 2025 Jul 17;11:100278. doi: 10.1016/j.fochms.2025.100278. eCollection 2025 Dec.
4
Integrated analysis of microRNA and mRNA interactions regulating fecundity in the ovaries of two distinct sheep breeds.对两个不同绵羊品种卵巢中调节繁殖力的微小RNA和信使核糖核酸相互作用的综合分析
BMC Genomics. 2025 Jul 31;26(1):707. doi: 10.1186/s12864-025-11408-0.
5
A self-adaptive and versatile tool for eliminating multiple undesirable variations from large-scale transcriptomes.一种用于消除大规模转录组中多种不良变异的自适应通用工具。
Nat Biomed Eng. 2025 Jul 25. doi: 10.1038/s41551-025-01466-w.
6
Multifaceted regulation of the HOX cluster and its implications in oral cancer.HOX基因簇的多方面调控及其在口腔癌中的意义
Clin Epigenetics. 2025 Jul 17;17(1):126. doi: 10.1186/s13148-025-01933-w.
7
Strong correlation of gene counts and differentially expressed genes between a 3' RNA-Seq and an RNA hybridization platform in transcriptome analyses from canine archival tissues.在犬类存档组织的转录组分析中,3' RNA测序与RNA杂交平台之间基因计数和差异表达基因的强相关性。
Front Vet Sci. 2025 Jun 30;12:1601306. doi: 10.3389/fvets.2025.1601306. eCollection 2025.
8
Privacy-preserving multicenter differential protein abundance analysis with FedProt.使用FedProt进行隐私保护的多中心差异蛋白质丰度分析。
Nat Comput Sci. 2025 Aug;5(8):675-688. doi: 10.1038/s43588-025-00832-7. Epub 2025 Jul 11.
9
Identification and correction of time-series transcriptomic anomalies.时间序列转录组异常的识别与校正。
Nucleic Acids Res. 2025 Jun 20;53(12). doi: 10.1093/nar/gkaf524.
10
Transcriptional Memory Dampens Heat Shock Responses in Yeast: Functional Role of Mip6 and its interaction with Rpd3.转录记忆减弱酵母中的热休克反应:Mip6的功能作用及其与Rpd3的相互作用
G3 (Bethesda). 2025 Jun 19. doi: 10.1093/g3journal/jkaf144.
BMC Bioinformatics. 2009 Jan 6;10:2. doi: 10.1186/1471-2105-10-2.
4
Novel low abundance and transient RNAs in yeast revealed by tiling microarrays and ultra high-throughput sequencing are not conserved across closely related yeast species.通过平铺微阵列和超高通量测序揭示的酵母中新型低丰度和瞬时RNA在密切相关的酵母物种中并不保守。
PLoS Genet. 2008 Dec;4(12):e1000299. doi: 10.1371/journal.pgen.1000299. Epub 2008 Dec 19.
5
Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model.数字转录组分析所需标签密度的测定:应用于雄激素敏感性前列腺癌模型
Proc Natl Acad Sci U S A. 2008 Dec 23;105(51):20179-84. doi: 10.1073/pnas.0807121105. Epub 2008 Dec 16.
6
High-resolution mapping of copy-number alterations with massively parallel sequencing.利用大规模平行测序技术对拷贝数变异进行高分辨率图谱绘制。
Nat Methods. 2009 Jan;6(1):99-103. doi: 10.1038/nmeth.1276. Epub 2008 Nov 30.
7
Accurate whole human genome sequencing using reversible terminator chemistry.使用可逆终止子化学法进行准确的全人类基因组测序。
Nature. 2008 Nov 6;456(7218):53-9. doi: 10.1038/nature07517.
8
Alternative isoform regulation in human tissue transcriptomes.人类组织转录组中的可变亚型调控
Nature. 2008 Nov 27;456(7221):470-6. doi: 10.1038/nature07509.
9
Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms.基于深度测序的表达分析在稳健性、分辨率和实验室间可移植性方面相较于五个微阵列平台有了重大进展。
Nucleic Acids Res. 2008 Dec;36(21):e141. doi: 10.1093/nar/gkn705. Epub 2008 Oct 15.
10
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.来自高通量DNA测序的超短读长数据集存在大量偏差。
Nucleic Acids Res. 2008 Sep;36(16):e105. doi: 10.1093/nar/gkn425. Epub 2008 Jul 26.