Suppr超能文献

通过 ℓ-正则化回归进行样本间联合标准化和差异表达检测。

Joint between-sample normalization and differential expression detection through ℓ-regularized regression.

机构信息

Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA, 19104, USA.

Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109, USA.

出版信息

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):593. doi: 10.1186/s12859-019-3070-4.

Abstract

BACKGROUND

A fundamental problem in RNA-seq data analysis is to identify genes or exons that are differentially expressed with varying experimental conditions based on the read counts. The relativeness of RNA-seq measurements makes the between-sample normalization of read counts an essential step in differential expression (DE) analysis. In most existing methods, the normalization step is performed prior to the DE analysis. Recently, Jiang and Zhan proposed a statistical method which introduces sample-specific normalization parameters into a joint model, which allows for simultaneous normalization and differential expression analysis from log-transformed RNA-seq data. Furthermore, an ℓ penalty is used to yield a sparse solution which selects a subset of DE genes. The experimental conditions are restricted to be categorical in their work.

RESULTS

In this paper, we generalize Jiang and Zhan's method to handle experimental conditions that are measured in continuous variables. As a result, genes with expression levels associated with a single or multiple covariates can be detected. As the problem being high-dimensional, non-differentiable and non-convex, we develop an efficient algorithm for model fitting.

CONCLUSIONS

Experiments on synthetic data demonstrate that the proposed method outperforms existing methods in terms of detection accuracy when a large fraction of genes are differentially expressed in an asymmetric manner, and the performance gain becomes more substantial for larger sample sizes. We also apply our method to a real prostate cancer RNA-seq dataset to identify genes associated with pre-operative prostate-specific antigen (PSA) levels in patients.

摘要

背景

RNA-seq 数据分析中的一个基本问题是根据读取计数,识别在不同实验条件下差异表达的基因或外显子。RNA-seq 测量的相对性使得读取计数的样本间标准化成为差异表达(DE)分析中的一个重要步骤。在大多数现有方法中,标准化步骤在 DE 分析之前执行。最近,Jiang 和 Zhan 提出了一种统计方法,该方法将样本特定的标准化参数引入联合模型中,允许从对数转换的 RNA-seq 数据中同时进行标准化和差异表达分析。此外,使用 ℓ 惩罚来产生稀疏解,选择 DE 基因的子集。他们的工作中实验条件被限制为分类变量。

结果

在本文中,我们将 Jiang 和 Zhan 的方法推广到处理以连续变量测量的实验条件。因此,可以检测到与单个或多个协变量相关的表达水平的基因。由于问题是高维的、不可微的和非凸的,我们开发了一种有效的模型拟合算法。

结论

在合成数据上的实验表明,当大量基因以不对称方式差异表达时,与现有方法相比,该方法在检测准确性方面表现更好,并且随着样本量的增大,性能增益变得更加显著。我们还将我们的方法应用于真实的前列腺癌 RNA-seq 数据集,以识别与患者术前前列腺特异性抗原(PSA)水平相关的基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f90/6886201/d7b4fc139369/12859_2019_3070_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验