通过 ℓ-正则化回归进行样本间联合标准化和差异表达检测。

Joint between-sample normalization and differential expression detection through ℓ-regularized regression.

机构信息

Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA, 19104, USA.

Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109, USA.

出版信息

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):593. doi: 10.1186/s12859-019-3070-4.

DOI:10.1186/s12859-019-3070-4

PMID:31787074

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6886201/

Abstract

BACKGROUND

A fundamental problem in RNA-seq data analysis is to identify genes or exons that are differentially expressed with varying experimental conditions based on the read counts. The relativeness of RNA-seq measurements makes the between-sample normalization of read counts an essential step in differential expression (DE) analysis. In most existing methods, the normalization step is performed prior to the DE analysis. Recently, Jiang and Zhan proposed a statistical method which introduces sample-specific normalization parameters into a joint model, which allows for simultaneous normalization and differential expression analysis from log-transformed RNA-seq data. Furthermore, an ℓ penalty is used to yield a sparse solution which selects a subset of DE genes. The experimental conditions are restricted to be categorical in their work.

RESULTS

In this paper, we generalize Jiang and Zhan's method to handle experimental conditions that are measured in continuous variables. As a result, genes with expression levels associated with a single or multiple covariates can be detected. As the problem being high-dimensional, non-differentiable and non-convex, we develop an efficient algorithm for model fitting.

CONCLUSIONS

Experiments on synthetic data demonstrate that the proposed method outperforms existing methods in terms of detection accuracy when a large fraction of genes are differentially expressed in an asymmetric manner, and the performance gain becomes more substantial for larger sample sizes. We also apply our method to a real prostate cancer RNA-seq dataset to identify genes associated with pre-operative prostate-specific antigen (PSA) levels in patients.

摘要

背景

RNA-seq 数据分析中的一个基本问题是根据读取计数，识别在不同实验条件下差异表达的基因或外显子。RNA-seq 测量的相对性使得读取计数的样本间标准化成为差异表达（DE）分析中的一个重要步骤。在大多数现有方法中，标准化步骤在 DE 分析之前执行。最近，Jiang 和 Zhan 提出了一种统计方法，该方法将样本特定的标准化参数引入联合模型中，允许从对数转换的 RNA-seq 数据中同时进行标准化和差异表达分析。此外，使用 ℓ 惩罚来产生稀疏解，选择 DE 基因的子集。他们的工作中实验条件被限制为分类变量。

结果

在本文中，我们将 Jiang 和 Zhan 的方法推广到处理以连续变量测量的实验条件。因此，可以检测到与单个或多个协变量相关的表达水平的基因。由于问题是高维的、不可微的和非凸的，我们开发了一种有效的模型拟合算法。

结论

在合成数据上的实验表明，当大量基因以不对称方式差异表达时，与现有方法相比，该方法在检测准确性方面表现更好，并且随着样本量的增大，性能增益变得更加显著。我们还将我们的方法应用于真实的前列腺癌 RNA-seq 数据集，以识别与患者术前前列腺特异性抗原（PSA）水平相关的基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f90/6886201/d7b4fc139369/12859_2019_3070_Fig1_HTML.jpg

相似文献

Joint between-sample normalization and differential expression detection through ℓ-regularized regression.

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):593. doi: 10.1186/s12859-019-3070-4.

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):442-454. doi: 10.1109/TCBB.2018.2790918. Epub 2018 Jan 8.

A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.

PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies.

BMC Genomics. 2020 Jan 28;21(1):75. doi: 10.1186/s12864-020-6502-7.

CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates.

BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):575. doi: 10.1186/s12859-017-1974-4.

A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data.

BMC Genomics. 2021 Jun 26;22(1):479. doi: 10.1186/s12864-021-07790-0.

Efficient Regularized Regression with Penalty for Variable Selection and Network Construction.

Comput Math Methods Med. 2016;2016:3456153. doi: 10.1155/2016/3456153. Epub 2016 Oct 24.

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.

BMC Genomics. 2016 Jan 5;17:28. doi: 10.1186/s12864-015-2353-z.

LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data.

BMC Genomics. 2014;15 Suppl 10(Suppl 10):S7. doi: 10.1186/1471-2164-15-S10-S7. Epub 2014 Dec 12.

Differential expression analysis using a model-based gene clustering algorithm for RNA-seq data.

BMC Bioinformatics. 2021 Oct 20;22(1):511. doi: 10.1186/s12859-021-04438-4.

引用本文的文献

Statistical analysis in metabolic phenotyping.

Nat Protoc. 2021 Sep;16(9):4299-4326. doi: 10.1038/s41596-021-00579-1. Epub 2021 Jul 28.

本文引用的文献

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):442-454. doi: 10.1109/TCBB.2018.2790918. Epub 2018 Jan 8.

Validation of Novel Biomarkers for Prostate Cancer Progression by the Combination of Bioinformatics, Clinical and Functional Studies.

PLoS One. 2016 May 19;11(5):e0155901. doi: 10.1371/journal.pone.0155901. eCollection 2016.

The Molecular Taxonomy of Primary Prostate Cancer.

Cell. 2015 Nov 5;163(4):1011-25. doi: 10.1016/j.cell.2015.10.025.

Downregulation of EphA5 by promoter methylation in human prostate cancer.

BMC Cancer. 2015 Jan 22;15:18. doi: 10.1186/s12885-015-1025-3.

limma powers differential expression analyses for RNA-sequencing and microarray studies.

Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015 Jan 20.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.

Robustly detecting differential expression in RNA sequencing data using observation weights.

Nucleic Acids Res. 2014 Jun;42(11):e91. doi: 10.1093/nar/gku310. Epub 2014 Apr 20.

voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.

Genome Biol. 2014 Feb 3;15(2):R29. doi: 10.1186/gb-2014-15-2-r29.

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis.

Brief Bioinform. 2013 Nov;14(6):671-83. doi: 10.1093/bib/bbs046. Epub 2012 Sep 17.

Differential expression analysis for sequence count data.

Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过 ℓ-正则化回归进行样本间联合标准化和差异表达检测。

Joint between-sample normalization and differential expression detection through ℓ-regularized regression.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献