Gim Jungsoo, Won Sungho, Park Taesung
* Institute of Health and Environment, Seoul National University, Gwanak-gu Seoul, 151-747, South Korea.
† Graduate School of Public Health, Seoul National University, Gwanak-gu Seoul, 151-747, South Korea.
J Bioinform Comput Biol. 2016 Oct;14(5):1644006. doi: 10.1142/S0219720016440066. Epub 2016 Sep 15.
High throughput sequencing technology in transcriptomics studies contribute to the understanding of gene regulation mechanism and its cellular function, but also increases a need for accurate statistical methods to assess quantitative differences between experiments. Many methods have been developed to account for the specifics of count data: non-normality, a dependence of the variance on the mean, and small sample size. Among them, the small number of samples in typical experiments is still a challenge. Here we present a method for differential analysis of count data, using conditional estimation of local pooled dispersion parameters. A comprehensive evaluation of our proposed method in the aspect of differential gene expression analysis using both simulated and real data sets shows that the proposed method is more powerful than other existing methods while controlling the false discovery rates. By introducing conditional estimation of local pooled dispersion parameters, we successfully overcome the limitation of small power and enable a powerful quantitative analysis focused on differential expression test with the small number of samples.
转录组学研究中的高通量测序技术有助于理解基因调控机制及其细胞功能,但同时也增加了对准确统计方法的需求,以评估实验之间的定量差异。已经开发了许多方法来处理计数数据的特点:非正态性、方差对均值的依赖性以及小样本量。其中,典型实验中样本数量较少仍然是一个挑战。在这里,我们提出了一种用于计数数据差异分析的方法,该方法使用局部合并离散参数的条件估计。使用模拟数据集和真实数据集对我们提出的方法在差异基因表达分析方面进行的综合评估表明,该方法在控制错误发现率的同时比其他现有方法更强大。通过引入局部合并离散参数的条件估计,我们成功克服了功效低的局限性,并能够在样本数量较少的情况下针对差异表达测试进行强大的定量分析。