Suppr超能文献

RNA-seq 数据中的随机变异中出现了从高斯分布到幂律分布的广泛分布谱。

Broad distribution spectrum from Gaussian to power law appears in stochastic variations in RNA-seq data.

机构信息

Department of Mathematical and Life Sciences, Hiroshima University, Kagamiyama 1-3-1, Higashi-Hiroshima, Hiroshima, 739-8526, Japan.

Research Center for Mathematics on Chromatin Live Dynamics, Hiroshima University, Kagamiyama 1-3-1, Higashi-Hiroshima, Hiroshima, 739-8526, Japan.

出版信息

Sci Rep. 2018 May 29;8(1):8339. doi: 10.1038/s41598-018-26735-4.

Abstract

Gene expression levels exhibit stochastic variations among genetically identical organisms under the same environmental conditions. In many recent transcriptome analyses based on RNA sequencing (RNA-seq), variations in gene expression levels among replicates were assumed to follow a negative binomial distribution, although the physiological basis of this assumption remains unclear. In this study, RNA-seq data were obtained from Arabidopsis thaliana under eight conditions (21-27 replicates), and the characteristics of gene-dependent empirical probability density function (ePDF) profiles of gene expression levels were analyzed. For A. thaliana and Saccharomyces cerevisiae, various types of ePDF of gene expression levels were obtained that were classified as Gaussian, power law-like containing a long tail, or intermediate. These ePDF profiles were well fitted with a Gauss-power mixing distribution function derived from a simple model of a stochastic transcriptional network containing a feedback loop. The fitting function suggested that gene expression levels with long-tailed ePDFs would be strongly influenced by feedback regulation. Furthermore, the features of gene expression levels are correlated with their functions, with the levels of essential genes tending to follow a Gaussian-like ePDF while those of genes encoding nucleic acid-binding proteins and transcription factors exhibit long-tailed ePDF.

摘要

在相同环境条件下,基因表达水平在遗传上相同的生物体中表现出随机变化。在许多基于 RNA 测序 (RNA-seq) 的最近转录组分析中,尽管这种假设的生理基础仍不清楚,但假定重复样本之间的基因表达水平变化遵循负二项分布。在这项研究中,从拟南芥在八种条件下(21-27 个重复)获得了 RNA-seq 数据,并分析了基因表达水平的基因依赖性经验概率密度函数 (ePDF) 特征。对于拟南芥和酿酒酵母,获得了各种类型的基因表达水平的 ePDF,这些 ePDF 被分类为高斯型、具有长尾的幂律型或中间型。这些 ePDF 分布很好地符合了一个简单的随机转录网络模型推导出来的高斯-幂混合分布函数,该模型包含一个反馈回路。拟合函数表明,长尾 ePDF 的基因表达水平将受到反馈调节的强烈影响。此外,基因表达水平的特征与其功能相关,必需基因的水平倾向于遵循高斯型 ePDF,而编码核酸结合蛋白和转录因子的基因则表现出长尾 ePDF。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验