Suppr超能文献

RNA-seq 差异分析方法评估。

An evaluation of RNA-seq differential analysis methods.

机构信息

Clinical and Translational Science Institute, School of Medicine and Dentistry, University of Rochester, Rochester, NY, United States of America.

Department of Medicine, Division of Nephrology, School of Medicine and Dentistry, University of Rochester, Rochester, NY, United States of America.

出版信息

PLoS One. 2022 Sep 16;17(9):e0264246. doi: 10.1371/journal.pone.0264246. eCollection 2022.

Abstract

RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.

摘要

RNA-seq 是一种高通量测序技术,广泛用于不同生物或生物医学条件下的基因转录本发现和定量。大多数 RNA-seq 实验的一个基本研究问题是在实验条件或样本组之间识别差异表达的基因。自从 RNA-seq 检测出现以来,已经提出了许多用于 RNA-seq 差异分析的统计方法。为了评估开源 R 和 Bioconductor 包中使用的流行的差异分析方法,我们进行了多次模拟研究,比较了用于 RNA-seq 数据分析的 8 种 RNA-seq 差异分析方法(edgeR、DESeq、DESeq2、baySeq、EBSeq、NOISeq、SAMSeq 和 Voom)的性能。这些比较是在不同的情况下进行的,包括库大小相等或不相等、不同的分布假设和样本大小。我们使用错误发现率 (FDR) 控制、功效和稳定性来衡量性能。无论库大小是否相等,方法之间在 FDR 控制、功效或稳定性方面都没有观察到显著差异。对于具有负二项分布的 RNA-seq 计数数据,当每组样本量为 3 时,EBSeq 在 FDR 控制、功效和稳定性方面的表现优于其他方法。当每组样本量增加到 6 或 12 时,DESeq2 的表现略优于其他方法。除了 DESeq,当每组样本量增加到 12 时,所有方法的性能都有所提高。对于具有对数正态分布的 RNA-seq 计数数据,在所有样本量下,DESeq 和 DESeq2 方法在 FDR 控制、功效和稳定性方面都优于其他方法。还使用真实的 RNA-seq 实验数据比较了每种方法的总发现数和发现的稳定性。对于 RNA-seq 数据分析,当每组样本量小至 3 时,推荐使用 EBSeq 方法,当数据遵循负二项分布时,每组样本量为 6 或更高时,推荐使用 DESeq2 方法。当数据遵循对数正态分布时,推荐使用 DESeq 和 DESeq2 方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6f6/9480998/a353e2316a30/pone.0264246.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验