Suppr超能文献

用于检测RNA测序数据中差异表达基因的统计方法。

Statistical methods on detecting differentially expressed genes for RNA-seq data.

作者信息

Chen Zhongxue, Liu Jianzhong, Ng Hon Keung Tony, Nadarajah Saralees, Kaufman Howard L, Yang Jack Y, Deng Youping

机构信息

Biostatistics Epidemiology Research Design Core, Center for Clinical and Translational Sciences, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

出版信息

BMC Syst Biol. 2011;5 Suppl 3(Suppl 3):S1. doi: 10.1186/1752-0509-5-S3-S1. Epub 2011 Dec 23.

Abstract

BACKGROUND

For RNA-seq data, the aggregated counts of the short reads from the same gene is used to approximate the gene expression level. The count data can be modelled as samples from Poisson distributions with possible different parameters. To detect differentially expressed genes under two situations, statistical methods for detecting the difference of two Poisson means are used. When the expression level of a gene is low, i.e., the number of count is small, it is usually more difficult to detect the mean differences, and therefore statistical methods which are more powerful for low expression level are particularly desirable. In statistical literature, several methods have been proposed to compare two Poisson means (rates). In this paper, we compare these methods by using simulated and real RNA-seq data.

RESULTS

Through simulation study and real data analysis, we find that the Wald test with the data being log-transformed is more powerful than other methods, including the likelihood ratio test, which has similar power as the variance stabilizing transformation test; both are more powerful than the conditional exact test and Fisher exact test.

CONCLUSIONS

When the count data in RNA-seq can be reasonably modelled as Poisson distribution, the Wald-Log test is more powerful and should be used to detect the differentially expressed genes.

摘要

背景

对于RNA测序数据,来自同一基因的短读段的汇总计数用于近似基因表达水平。计数数据可建模为来自具有可能不同参数的泊松分布的样本。为了检测两种情况下的差异表达基因,使用检测两个泊松均值差异的统计方法。当基因的表达水平较低时,即计数数量较少时,通常更难检测到均值差异,因此对于低表达水平更具效力的统计方法尤为可取。在统计文献中,已经提出了几种比较两个泊松均值(比率)的方法。在本文中,我们通过使用模拟和真实的RNA测序数据来比较这些方法。

结果

通过模拟研究和实际数据分析,我们发现对数据进行对数转换后的Wald检验比其他方法更具效力,包括似然比检验,其效力与方差稳定变换检验相似;这两种检验都比条件精确检验和Fisher精确检验更具效力。

结论

当RNA测序中的计数数据可以合理地建模为泊松分布时,Wald-Log检验更具效力,应使用它来检测差异表达基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf4e/3287564/bbf00d9040cc/1752-0509-5-S3-S1-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验