mRNA-Seq 数据中的技术和生物学变异性结构：现实世界中的生活。

Technical and biological variance structure in mRNA-Seq data: life in the real world.

机构信息

Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN 55905, USA.

出版信息

BMC Genomics. 2012 Jul 7;13:304. doi: 10.1186/1471-2164-13-304.

DOI:10.1186/1471-2164-13-304

PMID:22769017

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3505161/

Abstract

BACKGROUND

mRNA expression data from next generation sequencing platforms is obtained in the form of counts per gene or exon. Counts have classically been assumed to follow a Poisson distribution in which the variance is equal to the mean. The Negative Binomial distribution which allows for over-dispersion, i.e., for the variance to be greater than the mean, is commonly used to model count data as well.

RESULTS

In mRNA-Seq data from 25 subjects, we found technical variation to generally follow a Poisson distribution as has been reported previously and biological variability was over-dispersed relative to the Poisson model. The mean-variance relationship across all genes was quadratic, in keeping with a Negative Binomial (NB) distribution. Over-dispersed Poisson and NB distributional assumptions demonstrated marked improvements in goodness-of-fit (GOF) over the standard Poisson model assumptions, but with evidence of over-fitting in some genes. Modeling of experimental effects improved GOF for high variance genes but increased the over-fitting problem.

CONCLUSIONS

These conclusions will guide development of analytical strategies for accurate modeling of variance structure in these data and sample size determination which in turn will aid in the identification of true biological signals that inform our understanding of biological systems.

摘要

背景

下一代测序平台的 mRNA 表达数据以每个基因或外显子的计数形式获得。传统上，计数被假定遵循泊松分布，其中方差等于均值。负二项分布允许过度分散，即方差大于均值，也常用于对计数数据进行建模。

结果

在 25 名受试者的 mRNA-Seq 数据中，我们发现技术变异通常遵循泊松分布，如先前报道的那样，并且与泊松模型相比，生物变异性呈过度分散。所有基因的均值-方差关系呈二次曲线，符合负二项式（NB）分布。与标准泊松模型假设相比，过度分散的泊松和 NB 分布假设显著改善了拟合优度（GOF），但在某些基因中存在过度拟合的证据。实验效应的建模提高了高方差基因的拟合优度，但增加了过度拟合问题。

结论

这些结论将指导针对这些数据中方差结构的准确建模和样本量确定的分析策略的开发，这反过来将有助于识别真实的生物学信号，从而帮助我们理解生物系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cf1/3505161/77ccc86352fb/1471-2164-13-304-1.jpg

相似文献

Technical and biological variance structure in mRNA-Seq data: life in the real world.mRNA-Seq 数据中的技术和生物学变异性结构：现实世界中的生活。

BMC Genomics. 2012 Jul 7;13:304. doi: 10.1186/1471-2164-13-304.

NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.NBLDA：用于RNA测序数据的负二项式线性判别分析。

BMC Bioinformatics. 2016 Sep 13;17(1):369. doi: 10.1186/s12859-016-1208-1.

Analyzing hospitalization data: potential limitations of Poisson regression.分析住院数据：泊松回归的潜在局限性

Nephrol Dial Transplant. 2015 Aug;30(8):1244-9. doi: 10.1093/ndt/gfv071. Epub 2015 Mar 25.

Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data?拟泊松回归与负二项回归：我们应如何对过度离散计数数据进行建模？

Ecology. 2007 Nov;88(11):2766-72. doi: 10.1890/07-0043.1.

On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data.关于使用零膨胀模型和障碍模型对疫苗不良事件计数数据进行建模

J Biopharm Stat. 2006;16(4):463-81. doi: 10.1080/10543400600719384.

On performance of parametric and distribution-free models for zero-inflated and over-dispersed count responses.关于零膨胀和过度分散计数响应的参数模型和非参数模型的性能。

Stat Med. 2015 Oct 30;34(24):3235-45. doi: 10.1002/sim.6560. Epub 2015 Jun 15.

Application of the Conway-Maxwell-Poisson generalized linear model for analyzing motor vehicle crashes.康威-麦克斯韦-泊松广义线性模型在分析机动车碰撞事故中的应用。

Accid Anal Prev. 2008 May;40(3):1123-34. doi: 10.1016/j.aap.2007.12.003. Epub 2008 Jan 4.

Investigating the effects of the fixed and varying dispersion parameters of Poisson-gamma models on empirical Bayes estimates.研究泊松-伽马模型的固定和变化离散参数对经验贝叶斯估计的影响。

Accid Anal Prev. 2008 Jul;40(4):1441-57. doi: 10.1016/j.aap.2008.03.014. Epub 2008 Apr 18.

Modelling overdispersion and Markovian features in count data.对计数数据中的过离散和马尔可夫特征进行建模。

J Pharmacokinet Pharmacodyn. 2009 Oct;36(5):461-77. doi: 10.1007/s10928-009-9131-y. Epub 2009 Oct 2.

On the nature of over-dispersion in motor vehicle crash prediction models.机动车碰撞预测模型中过度离散的本质

Accid Anal Prev. 2007 May;39(3):459-68. doi: 10.1016/j.aap.2006.08.002. Epub 2006 Dec 8.

引用本文的文献

The Sum of Two Halves May Be Different from the Whole-Effects of Splitting Sequencing Samples Across Lanes.两半之和可能与跨道分割测序样本的整体效果不同。

Genes (Basel). 2022 Dec 1;13(12):2265. doi: 10.3390/genes13122265.

Patterns, Profiles, and Parsimony: Dissecting Transcriptional Signatures From Minimal Single-Cell RNA-Seq Output With SALSA.模式、概况与简约性：利用SALSA从最小单细胞RNA测序输出中剖析转录特征

Front Genet. 2020 Oct 9;11:511286. doi: 10.3389/fgene.2020.511286. eCollection 2020.

Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb).逐步式大规模基因组组装方法：以西伯利亚落叶松（Larix sibirica Ledeb）为例。

BMC Bioinformatics. 2019 Feb 5;20(Suppl 1):37. doi: 10.1186/s12859-018-2570-y.

Expression analysis of RNA sequencing data from human neural and glial cell lines depends on technical replication and normalization methods.从人类神经和神经胶质细胞系的 RNA 测序数据的表达分析取决于技术复制和归一化方法。

BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):412. doi: 10.1186/s12859-018-2382-0.

In Search of Biomarkers for Pathogenesis and Control of Leishmaniasis by Global Analyses of -Infected Macrophages.通过对感染巨噬细胞的全球分析寻找利什曼病发病机制和控制的生物标志物。

Front Cell Infect Microbiol. 2018 Sep 19;8:326. doi: 10.3389/fcimb.2018.00326. eCollection 2018.

Metabolic network-based predictions of toxicant-induced metabolite changes in the laboratory rat.基于代谢网络的毒理性代谢物变化的实验室大鼠预测。

Sci Rep. 2018 Aug 3;8(1):11678. doi: 10.1038/s41598-018-30149-7.

A Leveraged Signal-to-Noise Ratio (LSTNR) Method to Extract Differentially Expressed Genes and Multivariate Patterns of Expression From Noisy and Low-Replication RNAseq Data.一种利用信噪比（LSTNR）的方法，用于从噪声大且重复次数少的RNA测序数据中提取差异表达基因和多元表达模式。

Front Genet. 2018 May 16;9:176. doi: 10.3389/fgene.2018.00176. eCollection 2018.

Bayesian Inference of Allele-Specific Gene Expression Indicates Abundant Cis-Regulatory Variation in Natural Flycatcher Populations.等位基因特异性基因表达的贝叶斯推断表明，在自然鹟种群中存在丰富的顺式调控变异。

Genome Biol Evol. 2017 May 1;9(5):1266-1279. doi: 10.1093/gbe/evx080.

Diverse Non-genetic, Allele-Specific Expression Effects Shape Genetic Architecture at the Cellular Level in the Mammalian Brain.多种非遗传、等位基因特异性表达效应在哺乳动物大脑的细胞水平塑造遗传结构。

Neuron. 2017 Mar 8;93(5):1094-1109.e7. doi: 10.1016/j.neuron.2017.01.033. Epub 2017 Feb 23.

Gene signatures associated with adaptive humoral immunity following seasonal influenza A/H1N1 vaccination.与季节性甲型H1N1流感疫苗接种后适应性体液免疫相关的基因特征

Genes Immun. 2016 Dec;17(7):371-379. doi: 10.1038/gene.2016.34. Epub 2016 Aug 18.

本文引用的文献

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.针对生物变异的多因素 RNA-Seq 实验的差异表达分析。

Nucleic Acids Res. 2012 May;40(10):4288-97. doi: 10.1093/nar/gks042. Epub 2012 Jan 28.

Normalization, testing, and false discovery rate estimation for RNA-sequencing data.RNA-seq 数据的归一化、测试和错误发现率估计。

Biostatistics. 2012 Jul;13(3):523-38. doi: 10.1093/biostatistics/kxr031. Epub 2011 Oct 14.

RNA-seq: technical variability and sampling.RNA-seq：技术变异性和采样。

BMC Genomics. 2011 Jun 6;12:293. doi: 10.1186/1471-2164-12-293.

From RNA-seq reads to differential expression results.从 RNA-seq 读取到差异表达结果。

Genome Biol. 2010;11(12):220. doi: 10.1186/gb-2010-11-12-220. Epub 2010 Dec 22.

Statistical Analyses of Next Generation Sequence Data: A Partial Overview.下一代测序数据的统计分析：部分概述

J Proteomics Bioinform. 2010 Jun 1;3(6):183-190. doi: 10.4172/jpb.1000138.

Differential expression analysis for sequence count data.差异表达分析序列计数数据。

Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.

Statistical design and analysis of RNA sequencing data.RNA 测序数据的统计设计与分析。

Genetics. 2010 Jun;185(2):405-16. doi: 10.1534/genetics.110.114983. Epub 2010 May 3.

SNP/haplotype associations in cytokine and cytokine receptor genes and immunity to rubella vaccine.细胞因子和细胞因子受体基因中的 SNP/单倍型关联与风疹疫苗免疫反应。

Immunogenetics. 2010 Apr;62(4):197-210. doi: 10.1007/s00251-010-0423-6. Epub 2010 Mar 10.

A scaling normalization method for differential expression analysis of RNA-seq data.RNA-seq 数据差异表达分析的缩放标准化方法。

Genome Biol. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. Epub 2010 Mar 2.

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.mRNA-Seq 实验中标准化和差异表达的统计方法评估。

BMC Bioinformatics. 2010 Feb 18;11:94. doi: 10.1186/1471-2105-11-94.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

mRNA-Seq 数据中的技术和生物学变异性结构：现实世界中的生活。

Technical and biological variance structure in mRNA-Seq data: life in the real world.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献