Suppr超能文献

mRNA-Seq 数据中的技术和生物学变异性结构:现实世界中的生活。

Technical and biological variance structure in mRNA-Seq data: life in the real world.

机构信息

Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN 55905, USA.

出版信息

BMC Genomics. 2012 Jul 7;13:304. doi: 10.1186/1471-2164-13-304.

Abstract

BACKGROUND

mRNA expression data from next generation sequencing platforms is obtained in the form of counts per gene or exon. Counts have classically been assumed to follow a Poisson distribution in which the variance is equal to the mean. The Negative Binomial distribution which allows for over-dispersion, i.e., for the variance to be greater than the mean, is commonly used to model count data as well.

RESULTS

In mRNA-Seq data from 25 subjects, we found technical variation to generally follow a Poisson distribution as has been reported previously and biological variability was over-dispersed relative to the Poisson model. The mean-variance relationship across all genes was quadratic, in keeping with a Negative Binomial (NB) distribution. Over-dispersed Poisson and NB distributional assumptions demonstrated marked improvements in goodness-of-fit (GOF) over the standard Poisson model assumptions, but with evidence of over-fitting in some genes. Modeling of experimental effects improved GOF for high variance genes but increased the over-fitting problem.

CONCLUSIONS

These conclusions will guide development of analytical strategies for accurate modeling of variance structure in these data and sample size determination which in turn will aid in the identification of true biological signals that inform our understanding of biological systems.

摘要

背景

下一代测序平台的 mRNA 表达数据以每个基因或外显子的计数形式获得。传统上,计数被假定遵循泊松分布,其中方差等于均值。负二项分布允许过度分散,即方差大于均值,也常用于对计数数据进行建模。

结果

在 25 名受试者的 mRNA-Seq 数据中,我们发现技术变异通常遵循泊松分布,如先前报道的那样,并且与泊松模型相比,生物变异性呈过度分散。所有基因的均值-方差关系呈二次曲线,符合负二项式(NB)分布。与标准泊松模型假设相比,过度分散的泊松和 NB 分布假设显著改善了拟合优度(GOF),但在某些基因中存在过度拟合的证据。实验效应的建模提高了高方差基因的拟合优度,但增加了过度拟合问题。

结论

这些结论将指导针对这些数据中方差结构的准确建模和样本量确定的分析策略的开发,这反过来将有助于识别真实的生物学信号,从而帮助我们理解生物系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cf1/3505161/77ccc86352fb/1471-2164-13-304-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验