Suppr超能文献

从多个RNA测序数据集识别稳定表达的基因。

Identifying stably expressed genes from multiple RNA-Seq data sets.

作者信息

Zhuo Bin, Emerson Sarah, Chang Jeff H, Di Yanming

机构信息

Department of Statistics, Oregon State University , Corvallis , OR , United States.

Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States; Molecular and Cellular Biology Graduate Program, Oregon State University, Corvallis, OR, United States of America.

出版信息

PeerJ. 2016 Dec 20;4:e2791. doi: 10.7717/peerj.2791. eCollection 2016.

Abstract

We examined RNA-Seq data on 211 biological samples from 24 different Arabidopsis experiments carried out by different labs. We grouped the samples according to tissue types, and in each of the groups, we identified genes that are stably expressed across biological samples, treatment conditions, and experiments. We fit a Poisson log-linear mixed-effect model to the read counts for each gene and decomposed the total variance into between-sample, between-treatment and between-experiment variance components. Identifying stably expressed genes is useful for count normalization and differential expression analysis. The variance component analysis that we explore here is a first step towards understanding the sources and nature of the RNA-Seq count variation. When using a numerical measure to identify stably expressed genes, the outcome depends on multiple factors: the background sample set and the reference gene set used for count normalization, the technology used for measuring gene expression, and the specific numerical stability measure used. Since differential expression (DE) is measured by relative frequencies, we argue that DE is a relative concept. We advocate using an explicit reference gene set for count normalization to improve interpretability of DE results, and recommend using a common reference gene set when analyzing multiple RNA-Seq experiments to avoid potential inconsistent conclusions.

摘要

我们研究了来自不同实验室进行的24个不同拟南芥实验的211个生物样本的RNA测序数据。我们根据组织类型对样本进行分组,并在每组中识别出在生物样本、处理条件和实验中稳定表达的基因。我们对每个基因的读数计数拟合了泊松对数线性混合效应模型,并将总方差分解为样本间、处理间和实验间的方差分量。识别稳定表达的基因对于计数归一化和差异表达分析很有用。我们在此探索的方差分量分析是了解RNA测序计数变异的来源和性质的第一步。当使用数值度量来识别稳定表达的基因时,结果取决于多个因素:用于计数归一化的背景样本集和参考基因集、用于测量基因表达的技术以及所使用的特定数值稳定性度量。由于差异表达(DE)是通过相对频率来衡量的,我们认为DE是一个相对概念。我们主张使用明确的参考基因集进行计数归一化,以提高DE结果的可解释性,并建议在分析多个RNA测序实验时使用共同的参考基因集,以避免潜在的不一致结论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce1a/5178351/d6ebeaf00b79/peerj-04-2791-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验