在差异表达分析中使用混合模型对过度离散异质性进行建模。

Modeling overdispersion heterogeneity in differential expression analysis using mixtures.

作者信息

Bonafede Elisabetta, Picard Franck, Robin Stéphane, Viroli Cinzia

机构信息

Department of Statistical Sciences, University of Bologna, 40126 Italy.

Laboratoire de Biométrie et Biologie Évolutive, UMR CNRS 5558 Univ. Lyon 1, F-69622 Villeurbanne, France.

出版信息

Biometrics. 2016 Sep;72(3):804-14. doi: 10.1111/biom.12458. Epub 2015 Dec 18.

DOI:10.1111/biom.12458

PMID:26683201

Abstract

Next-generation sequencing technologies now constitute a method of choice to measure gene expression. Data to analyze are read counts, commonly modeled using negative binomial distributions. A relevant issue associated with this probabilistic framework is the reliable estimation of the overdispersion parameter, reinforced by the limited number of replicates generally observable for each gene. Many strategies have been proposed to estimate this parameter, but when differential analysis is the purpose, they often result in procedures based on plug-in estimates, and we show here that this discrepancy between the estimation framework and the testing framework can lead to uncontrolled type-I errors. Instead, we propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Three consistent statistical tests are developed for differential expression analysis. We show through a wide simulation study that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it reaches the nominal value for the type-I error, while keeping elevate discriminative power between differentially and not differentially expressed genes. The method is finally illustrated on prostate cancer RNA-Seq data.

摘要

新一代测序技术如今已成为衡量基因表达的一种首选方法。要分析的数据是读取计数，通常使用负二项分布进行建模。与这个概率框架相关的一个相关问题是过度离散参数的可靠估计，而每个基因通常可观察到的重复数量有限，这进一步强化了这个问题。已经提出了许多策略来估计这个参数，但当目的是进行差异分析时，它们往往会导致基于插件估计的程序，并且我们在此表明，估计框架和测试框架之间的这种差异可能会导致不受控制的I型错误。相反，我们提出了一种混合模型，该模型允许每个基因与其他具有相似变异性的基因共享信息。针对差异表达分析开发了三种一致的统计检验。我们通过广泛的模拟研究表明，所提出的方法相对于常用程序提高了检测差异表达基因的灵敏度，因为它达到了I型错误的标称值，同时在差异表达和非差异表达基因之间保持了较高的判别能力。该方法最终在前列腺癌RNA测序数据上得到了验证。

相似文献

Modeling overdispersion heterogeneity in differential expression analysis using mixtures.在差异表达分析中使用混合模型对过度离散异质性进行建模。

Biometrics. 2016 Sep;72(3):804-14. doi: 10.1111/biom.12458. Epub 2015 Dec 18.

A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.用于RNA测序数据差异表达分析的每个样本全局缩放和每个基因归一化方法的比较。

PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.

Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model.使用负二项回归模型对RNA测序数据进行差异表达分析的样本量计算。

Stat Appl Genet Mol Biol. 2019 Jan 22;18(1):/j/sagmb.2019.18.issue-1/sagmb-2018-0021/sagmb-2018-0021.xml. doi: 10.1515/sagmb-2018-0021.

Detection of high variability in gene expression from single-cell RNA-seq profiling.从单细胞RNA测序分析中检测基因表达的高变异性。

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):508. doi: 10.1186/s12864-016-2897-6.

A fuzzy method for RNA-Seq differential expression analysis in presence of multireads.一种用于存在多重读取情况下RNA测序差异表达分析的模糊方法。

BMC Bioinformatics. 2016 Nov 8;17(Suppl 12):345. doi: 10.1186/s12859-016-1195-2.

LPEseq: Local-Pooled-Error Test for RNA Sequencing Experiments with a Small Number of Replicates.LPEseq：针对少量重复样本的RNA测序实验的局部合并误差检验

PLoS One. 2016 Aug 17;11(8):e0159182. doi: 10.1371/journal.pone.0159182. eCollection 2016.

Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis.用于鉴定 RNA-seq 差异分析关键成分的合成数据集。

Brief Bioinform. 2018 Jan 1;19(1):65-76. doi: 10.1093/bib/bbw092.

Differentially expressed heterogeneous overdispersion genes testing for count data.针对计数数据的差异表达异质性过度离散基因检测

PLoS One. 2024 Jul 17;19(7):e0300565. doi: 10.1371/journal.pone.0300565. eCollection 2024.

Statistical detection of differentially expressed genes based on RNA-seq: from biological to phylogenetic replicates.基于 RNA-seq 的差异表达基因的统计检测：从生物学重复到系统发育重复。

Brief Bioinform. 2016 Mar;17(2):243-8. doi: 10.1093/bib/bbv035. Epub 2015 Jun 24.

The level of residual dispersion variation and the power of differential expression tests for RNA-Seq data.RNA测序数据的残余离散度变化水平及差异表达检验效能

PLoS One. 2015 Apr 7;10(4):e0120117. doi: 10.1371/journal.pone.0120117. eCollection 2015.

引用本文的文献

Detection of genes with differential expression dispersion unravels the role of autophagy in cancer progression.检测具有差异表达分散的基因揭示了自噬在癌症进展中的作用。

PLoS Comput Biol. 2023 Mar 9;19(3):e1010342. doi: 10.1371/journal.pcbi.1010342. eCollection 2023 Mar.

Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.用零膨胀混合泊松线性模型对 RNA-Seq 数据进行建模。

Genet Epidemiol. 2019 Oct;43(7):786-799. doi: 10.1002/gepi.22246. Epub 2019 Jul 22.

A permutation-based non-parametric analysis of CRISPR screen data.基于排列的CRISPR筛选数据非参数分析。

BMC Genomics. 2017 Jul 19;18(1):545. doi: 10.1186/s12864-017-3938-5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在差异表达分析中使用混合模型对过度离散异质性进行建模。

Modeling overdispersion heterogeneity in differential expression analysis using mixtures.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献