Suppr超能文献

RNA-Seq 数据中联合标准化和差异基因表达检测的统一模型。

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):442-454. doi: 10.1109/TCBB.2018.2790918. Epub 2018 Jan 8.

Abstract

The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization is an essential step in differential expression (DE) analysis. The normalization step of existing DE detection algorithms is usually ad hoc and performed only once prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2 penalty for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented Lagrangian method to solve it. Simulation and real data studies show that the proposed model and algorithms perform better than or comparably to existing methods in terms of detection power and false-positive rate. The performance gain increases with increasingly larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.

摘要

RNA 测序(RNA-seq)在定量基因表达水平方面越来越受欢迎。由于 RNA-seq 测量具有相对性质,因此在差异表达(DE)分析中,样本间归一化是必不可少的步骤。现有 DE 检测算法的归一化步骤通常是特定于应用的,并且仅在 DE 检测之前执行一次,这可能不是最佳的,因为理想情况下,归一化应该仅基于非 DE 基因,并且因此与 DE 检测相关联。我们提出了一种用于联合归一化和 RNA-seq 数据 DE 检测的统一统计模型。在基因线性模型中,将特定于样本的归一化因子建模为未知参数,并与回归系数一起进行联合估计。通过对回归系数施加稀疏诱导 L1 惩罚(或用于多种处理条件的混合 L1/L2 惩罚),我们将问题表述为惩罚最小二乘回归问题,并应用增广拉格朗日方法来解决它。模拟和真实数据研究表明,与现有方法相比,所提出的模型和算法在检测能力和假阳性率方面表现更好或相当。随着样本量或信噪比的增加,性能增益增加,并且当大量基因以非对称方式表现出差异表达时,性能增益更为显著。

相似文献

6
8
How does normalization impact RNA-seq disease diagnosis?归一化如何影响 RNA-seq 疾病诊断?
J Biomed Inform. 2018 Sep;85:80-92. doi: 10.1016/j.jbi.2018.07.016. Epub 2018 Jul 21.

本文引用的文献

3
The Molecular Taxonomy of Primary Prostate Cancer.原发性前列腺癌的分子分类学
Cell. 2015 Nov 5;163(4):1011-25. doi: 10.1016/j.cell.2015.10.025.
8
Statistical Modeling of RNA-Seq Data.RNA测序数据的统计建模
Stat Sci. 2011 Feb;26(1). doi: 10.1214/10-STS343.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验