RNA-Seq 数据中联合标准化和差异基因表达检测的统一模型。

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):442-454. doi: 10.1109/TCBB.2018.2790918. Epub 2018 Jan 8.

DOI:10.1109/TCBB.2018.2790918

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6686202/

Abstract

The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization is an essential step in differential expression (DE) analysis. The normalization step of existing DE detection algorithms is usually ad hoc and performed only once prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2 penalty for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented Lagrangian method to solve it. Simulation and real data studies show that the proposed model and algorithms perform better than or comparably to existing methods in terms of detection power and false-positive rate. The performance gain increases with increasingly larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.

摘要

RNA 测序（RNA-seq）在定量基因表达水平方面越来越受欢迎。由于 RNA-seq 测量具有相对性质，因此在差异表达（DE）分析中，样本间归一化是必不可少的步骤。现有 DE 检测算法的归一化步骤通常是特定于应用的，并且仅在 DE 检测之前执行一次，这可能不是最佳的，因为理想情况下，归一化应该仅基于非 DE 基因，并且因此与 DE 检测相关联。我们提出了一种用于联合归一化和 RNA-seq 数据 DE 检测的统一统计模型。在基因线性模型中，将特定于样本的归一化因子建模为未知参数，并与回归系数一起进行联合估计。通过对回归系数施加稀疏诱导 L1 惩罚（或用于多种处理条件的混合 L1/L2 惩罚），我们将问题表述为惩罚最小二乘回归问题，并应用增广拉格朗日方法来解决它。模拟和真实数据研究表明，与现有方法相比，所提出的模型和算法在检测能力和假阳性率方面表现更好或相当。随着样本量或信噪比的增加，性能增益增加，并且当大量基因以非对称方式表现出差异表达时，性能增益更为显著。

相似文献

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data.RNA-Seq 数据中联合标准化和差异基因表达检测的统一模型。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):442-454. doi: 10.1109/TCBB.2018.2790918. Epub 2018 Jan 8.

Joint between-sample normalization and differential expression detection through ℓ-regularized regression.通过 ℓ-正则化回归进行样本间联合标准化和差异表达检测。

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):593. doi: 10.1186/s12859-019-3070-4.

A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.用于RNA测序数据差异表达分析的每个样本全局缩放和每个基因归一化方法的比较。

PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies.RNA-seq 研究中平衡两组比较差异基因表达分析的库大小标准化和统计方法选择。

BMC Genomics. 2020 Jan 28;21(1):75. doi: 10.1186/s12864-020-6502-7.

Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.从假设的角度选择样本间 RNA-Seq 标准化方法。

Brief Bioinform. 2018 Sep 28;19(5):776-792. doi: 10.1093/bib/bbx008.

A Zipf-plot based normalization method for high-throughput RNA-seq data.基于 Zipf 分布的高通量 RNA-seq 数据标准化方法。

PLoS One. 2020 Apr 9;15(4):e0230594. doi: 10.1371/journal.pone.0230594. eCollection 2020.

Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods.RNA-Seq 差异表达分析工具的基准测试：基于标准化与基于对数比变换的方法。

BMC Bioinformatics. 2018 Jul 18;19(1):274. doi: 10.1186/s12859-018-2261-8.

How does normalization impact RNA-seq disease diagnosis?归一化如何影响 RNA-seq 疾病诊断？

J Biomed Inform. 2018 Sep;85:80-92. doi: 10.1016/j.jbi.2018.07.016. Epub 2018 Jul 21.

Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.通过纳入非外显子映射读数对RNA测序数据进行差异表达分析。

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11.

A Hypothesis Testing Based Method for Normalization and Differential Expression Analysis of RNA-Seq Data.一种基于假设检验的RNA测序数据标准化和差异表达分析方法。

PLoS One. 2017 Jan 10;12(1):e0169594. doi: 10.1371/journal.pone.0169594. eCollection 2017.

引用本文的文献

Cell type identification from single-cell transcriptomes in melanoma.从黑色素瘤的单细胞转录组中鉴定细胞类型。

BMC Med Genomics. 2021 Nov 17;14(Suppl 5):263. doi: 10.1186/s12920-021-01118-3.

Regulation of gene expression in the bovine blastocyst by colony-stimulating factor 2 is disrupted by CRISPR/Cas9-mediated deletion of CSF2RA.CRISPR/Cas9 介导的 CSF2RA 缺失破坏了牛囊胚中集落刺激因子 2 对基因表达的调控。

Biol Reprod. 2021 May 7;104(5):995-1007. doi: 10.1093/biolre/ioab015.

Joint between-sample normalization and differential expression detection through ℓ-regularized regression.通过 ℓ-正则化回归进行样本间联合标准化和差异表达检测。

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):593. doi: 10.1186/s12859-019-3070-4.

本文引用的文献

A penalized likelihood approach for robust estimation of isoform expression.一种用于异构体表达稳健估计的惩罚似然方法。

Stat Interface. 2015;8(4):437-445. doi: 10.4310/SII.2015.v8.n4.a3.

Validation of Novel Biomarkers for Prostate Cancer Progression by the Combination of Bioinformatics, Clinical and Functional Studies.通过生物信息学、临床和功能研究相结合验证前列腺癌进展的新型生物标志物

PLoS One. 2016 May 19;11(5):e0155901. doi: 10.1371/journal.pone.0155901. eCollection 2016.

The Molecular Taxonomy of Primary Prostate Cancer.原发性前列腺癌的分子分类学

Cell. 2015 Nov 5;163(4):1011-25. doi: 10.1016/j.cell.2015.10.025.

limma powers differential expression analyses for RNA-sequencing and microarray studies.limma为RNA测序和微阵列研究提供差异表达分析的动力。

Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015 Jan 20.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。

Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.

Robustly detecting differential expression in RNA sequencing data using observation weights.利用观测权重稳健检测RNA测序数据中的差异表达。

Nucleic Acids Res. 2014 Jun;42(11):e91. doi: 10.1093/nar/gku310. Epub 2014 Apr 20.

voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.voom：精确权重为RNA测序读数计数解锁线性模型分析工具。

Genome Biol. 2014 Feb 3;15(2):R29. doi: 10.1186/gb-2014-15-2-r29.

Statistical Modeling of RNA-Seq Data.RNA测序数据的统计建模

Stat Sci. 2011 Feb;26(1). doi: 10.1214/10-STS343.

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.RNA测序数据差异基因表达分析方法的综合评估

Genome Biol. 2013;14(9):R95. doi: 10.1186/gb-2013-14-9-r95.

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis.Illumina 高通量 RNA 测序数据分析中标准化方法的综合评估。

Brief Bioinform. 2013 Nov;14(6):671-83. doi: 10.1093/bib/bbs046. Epub 2012 Sep 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验