从多个RNA测序数据集识别稳定表达的基因。

Identifying stably expressed genes from multiple RNA-Seq data sets.

作者信息

Zhuo Bin, Emerson Sarah, Chang Jeff H, Di Yanming

机构信息

Department of Statistics, Oregon State University , Corvallis , OR , United States.

Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States; Molecular and Cellular Biology Graduate Program, Oregon State University, Corvallis, OR, United States of America.

出版信息

PeerJ. 2016 Dec 20;4:e2791. doi: 10.7717/peerj.2791. eCollection 2016.

DOI:10.7717/peerj.2791

PMID:28028467

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5178351/

Abstract

We examined RNA-Seq data on 211 biological samples from 24 different Arabidopsis experiments carried out by different labs. We grouped the samples according to tissue types, and in each of the groups, we identified genes that are stably expressed across biological samples, treatment conditions, and experiments. We fit a Poisson log-linear mixed-effect model to the read counts for each gene and decomposed the total variance into between-sample, between-treatment and between-experiment variance components. Identifying stably expressed genes is useful for count normalization and differential expression analysis. The variance component analysis that we explore here is a first step towards understanding the sources and nature of the RNA-Seq count variation. When using a numerical measure to identify stably expressed genes, the outcome depends on multiple factors: the background sample set and the reference gene set used for count normalization, the technology used for measuring gene expression, and the specific numerical stability measure used. Since differential expression (DE) is measured by relative frequencies, we argue that DE is a relative concept. We advocate using an explicit reference gene set for count normalization to improve interpretability of DE results, and recommend using a common reference gene set when analyzing multiple RNA-Seq experiments to avoid potential inconsistent conclusions.

摘要

我们研究了来自不同实验室进行的24个不同拟南芥实验的211个生物样本的RNA测序数据。我们根据组织类型对样本进行分组，并在每组中识别出在生物样本、处理条件和实验中稳定表达的基因。我们对每个基因的读数计数拟合了泊松对数线性混合效应模型，并将总方差分解为样本间、处理间和实验间的方差分量。识别稳定表达的基因对于计数归一化和差异表达分析很有用。我们在此探索的方差分量分析是了解RNA测序计数变异的来源和性质的第一步。当使用数值度量来识别稳定表达的基因时，结果取决于多个因素：用于计数归一化的背景样本集和参考基因集、用于测量基因表达的技术以及所使用的特定数值稳定性度量。由于差异表达（DE）是通过相对频率来衡量的，我们认为DE是一个相对概念。我们主张使用明确的参考基因集进行计数归一化，以提高DE结果的可解释性，并建议在分析多个RNA测序实验时使用共同的参考基因集，以避免潜在的不一致结论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce1a/5178351/d6ebeaf00b79/peerj-04-2791-g001.jpg

相似文献

Identifying stably expressed genes from multiple RNA-Seq data sets.从多个RNA测序数据集识别稳定表达的基因。

PeerJ. 2016 Dec 20;4:e2791. doi: 10.7717/peerj.2791. eCollection 2016.

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.使用来自726只黑腹果蝇个体的RNA测序数据进行标准化和差异表达分析的比较。

BMC Genomics. 2016 Jan 5;17:28. doi: 10.1186/s12864-015-2353-z.

EPIG-Seq: extracting patterns and identifying co-expressed genes from RNA-Seq data.EPIG-Seq：从RNA测序数据中提取模式并识别共表达基因。

BMC Genomics. 2016 Mar 22;17:255. doi: 10.1186/s12864-016-2584-7.

Gene dispersion is the key determinant of the read count bias in differential expression analysis of RNA-seq data.基因离散度是RNA-seq数据差异表达分析中读取计数偏差的关键决定因素。

BMC Genomics. 2017 May 25;18(1):408. doi: 10.1186/s12864-017-3809-0.

Getting the most out of RNA-seq data analysis.充分利用RNA测序数据分析。

PeerJ. 2015 Oct 29;3:e1360. doi: 10.7717/peerj.1360. eCollection 2015.

Joint between-sample normalization and differential expression detection through ℓ-regularized regression.通过 ℓ-正则化回归进行样本间联合标准化和差异表达检测。

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):593. doi: 10.1186/s12859-019-3070-4.

A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.用于RNA测序数据差异表达分析的每个样本全局缩放和每个基因归一化方法的比较。

PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.

Expression analysis of RNA sequencing data from human neural and glial cell lines depends on technical replication and normalization methods.从人类神经和神经胶质细胞系的 RNA 测序数据的表达分析取决于技术复制和归一化方法。

BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):412. doi: 10.1186/s12859-018-2382-0.

SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.SPARTA：用于基于参考的细菌RNA测序转录组自动分析的简单程序。

BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y.

Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq.基于 RNA-seq 的基因集水平偏倚去除的分组错误发现率

Evol Bioinform Online. 2013 Nov 13;9:467-78. doi: 10.4137/EBO.S13099. eCollection 2013.

引用本文的文献

Comprehensive analysis of housekeeping genes, tissue-specific genes, and dynamic regulation across developmental stages in pearl millet.珍珠粟管家基因、组织特异性基因及发育阶段动态调控的综合分析

BMC Genomics. 2024 Dec 18;25(1):1199. doi: 10.1186/s12864-024-11114-3.

Investigation of chicken housekeeping genes using next-generation sequencing data.利用下一代测序数据对鸡看家基因进行研究。

Front Genet. 2022 Sep 13;13:827538. doi: 10.3389/fgene.2022.827538. eCollection 2022.

Understanding Willow Transcriptional Response in the Context of Oil Sands Tailings Reclamation.在油砂尾矿复垦背景下理解柳树的转录反应。

Front Plant Sci. 2022 Apr 27;13:857535. doi: 10.3389/fpls.2022.857535. eCollection 2022.

cdev: a ground-truth based measure to evaluate RNA-seq normalization performance.cdev：一种基于真实数据的评估RNA测序标准化性能的指标。

PeerJ. 2021 Oct 4;9:e12233. doi: 10.7717/peerj.12233. eCollection 2021.

A graph-based algorithm for RNA-seq data normalization.基于图的算法用于 RNA-seq 数据标准化。

PLoS One. 2020 Jan 24;15(1):e0227760. doi: 10.1371/journal.pone.0227760. eCollection 2020.

Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis.自定义选择的参考基因在转录组分析中优于预定义的参考基因。

BMC Genomics. 2020 Jan 10;21(1):35. doi: 10.1186/s12864-019-6426-2.

Identification of gene expression logical invariants in .识别……中的基因表达逻辑不变量。（原文中“in.”后面内容缺失）

Plant Direct. 2019 Mar 20;3(3):e00123. doi: 10.1002/pld3.123. eCollection 2019 Mar.

Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data.寻找高维RNA测序数据的最佳低维可视化角度。

PeerJ. 2018 Jul 12;6:e5199. doi: 10.7717/peerj.5199. eCollection 2018.

RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.通过参考基因的计算机模拟预测进行RNA序列数据归一化：以细菌对DNA损伤的反应为例

BioData Min. 2017 Sep 5;10:30. doi: 10.1186/s13040-017-0150-8. eCollection 2017.

本文引用的文献

Translatome analyses capture of opposing tissue-specific brassinosteroid signals orchestrating root meristem differentiation.转录组分析揭示了相反的组织特异性油菜素内酯信号调控根分生组织分化的过程。

Proc Natl Acad Sci U S A. 2015 Jan 20;112(3):923-8. doi: 10.1073/pnas.1417947112. Epub 2015 Jan 5.

HTSeq--a Python framework to work with high-throughput sequencing data.HTSeq——一个用于处理高通量测序数据的Python框架。

Bioinformatics. 2015 Jan 15;31(2):166-9. doi: 10.1093/bioinformatics/btu638. Epub 2014 Sep 25.

Normalization of RNA-seq data using factor analysis of control genes or samples.使用对照基因或样本的因子分析对RNA测序数据进行标准化。

Nat Biotechnol. 2014 Sep;32(9):896-902. doi: 10.1038/nbt.2931. Epub 2014 Aug 24.

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor.基于计数的 RNA 测序数据分析使用 R 和 Bioconductor。

Nat Protoc. 2013 Sep;8(9):1765-86. doi: 10.1038/nprot.2013.099. Epub 2013 Aug 22.

Arabidopsis ferritin 1 (AtFer1) gene regulation by the phosphate starvation response 1 (AtPHR1) transcription factor reveals a direct molecular link between iron and phosphate homeostasis.拟南芥铁蛋白 1（AtFer1）基因受磷酸盐饥饿响应 1（AtPHR1）转录因子的调控，揭示了铁和磷酸盐稳态之间的直接分子联系。

J Biol Chem. 2013 Aug 2;288(31):22670-80. doi: 10.1074/jbc.M113.482281. Epub 2013 Jun 20.

The use of miRNA microarrays for the analysis of cancer samples with global miRNA decrease.利用 miRNA 微阵列分析 miRNA 整体下调的癌症样本。

RNA. 2013 Jul;19(7):876-88. doi: 10.1261/rna.035055.112. Epub 2013 May 24.

The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote.Subread 比对工具：基于种子投票的快速、准确和可扩展的读段比对。

Nucleic Acids Res. 2013 May 1;41(10):e108. doi: 10.1093/nar/gkt214. Epub 2013 Apr 4.

Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data.基于RNA测序数据的负二项回归推断的高阶渐近性

Stat Appl Genet Mol Biol. 2013 Mar 26;12(1):49-70. doi: 10.1515/sagmb-2012-0071.

Revisiting global gene expression analysis.重新审视全球基因表达分析。

Cell. 2012 Oct 26;151(3):476-82. doi: 10.1016/j.cell.2012.10.012.

SKIP is a component of the spliceosome linking alternative splicing and the circadian clock in Arabidopsis.SKIP 是剪接体的一个组成部分，连接拟南芥中的可变剪接和生物钟。

Plant Cell. 2012 Aug;24(8):3278-95. doi: 10.1105/tpc.112.100081. Epub 2012 Aug 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从多个RNA测序数据集识别稳定表达的基因。

Identifying stably expressed genes from multiple RNA-Seq data sets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献