Department of Statistics, University of Missouri, Columbia, MO, USA.
Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA.
Bioinformatics. 2019 Mar 1;35(5):787-797. doi: 10.1093/bioinformatics/bty731.
Several methods have been proposed for the paired RNA-seq analysis. However, many of them do not consider the heterogeneity in treatment effect among pairs that can naturally arise in real data. In addition, it has been reported in literature that the false discovery rate (FDR) control of some popular methods has been problematic. In this paper, we present a full hierarchical Bayesian model for the paired RNA-seq count data that accounts for variation of treatment effects among pairs and controls the FDR through the posterior expected FDR.
Our simulation studies show that most competing methods can have highly inflated FDR for small to moderate sample sizes while PairedFB is able to control FDR close to the nominal levels. Furthermore, PairedFB has overall better performance in ranking true differentially expressed genes (DEGs) on the top than others, especially when the sample size gets bigger or when the heterogeneity level of treatment effects is high. In addition, PairedFB can be applied to identify the biologically significant DEGs with controlled FDR. The real data analysis also indicates PairedFB tends to find more biologically relevant genes even when the sample size is small. PairedFB is also shown to be robust with respect to the model misspecification in terms of its relative performance compared to others.
Software to implement this method (PairedFB) can be downloaded at: https://sites.google.com/a/udel.edu/qiujing/publication.
Supplementary data are available at Bioinformatics online.
已经提出了几种用于配对 RNA-seq 分析的方法。然而,其中许多方法没有考虑到在实际数据中自然出现的配对之间治疗效果的异质性。此外,文献中已经报道了一些流行方法的错误发现率(FDR)控制存在问题。在本文中,我们提出了一种用于配对 RNA-seq 计数数据的全分层贝叶斯模型,该模型考虑了配对之间治疗效果的变化,并通过后验预期 FDR 控制 FDR。
我们的模拟研究表明,大多数竞争方法在小到中等样本量时,FDR 可能会高度膨胀,而 PairedFB 能够将 FDR 控制在接近名义水平。此外,PairedFB 在排名真正差异表达基因(DEGs)方面的整体性能优于其他方法,尤其是当样本量增大或治疗效果的异质性水平较高时。此外,PairedFB 可以应用于识别具有受控 FDR 的生物学上显著的 DEGs。即使在样本量较小的情况下,真实数据分析也表明 PairedFB 往往会发现更多生物学相关的基因。PairedFB 在模型误设定方面的相对性能也表现出稳健性。
可在以下网址下载用于实现该方法(PairedFB)的软件:https://sites.google.com/a/udel.edu/qiujing/publication。
补充数据可在生物信息学在线获得。