Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA.
Mathematics Department, Bryant University, Smithfield, RI, USA.
Bioinformatics. 2019 Jul 1;35(13):2235-2242. doi: 10.1093/bioinformatics/bty952.
In the analysis of RNA-Seq data, detecting differentially expressed (DE) genes has been a hot research area in recent years and many methods have been proposed. DE genes show different average expression levels in different sample groups, and thus can be important biological markers. While generally very successful, these methods need to be further tailored and improved for cancerous data, which often features quite diverse expression in the samples from the cancer group, and this diversity is much larger than that in the control group.
We propose a statistical method that can detect not only genes that show different average expressions, but also genes that show different diversities of expressions in different groups. These 'differentially dispersed' genes can be important clinical markers. Our method uses a redescending penalty on the quasi-likelihood function, and thus has superior robustness against outliers and other noise. Simulations and real data analysis demonstrate that DiPhiSeq outperforms existing methods in the presence of outliers, and identifies unique sets of genes.
DiPhiSeq is publicly available as an R package on CRAN: https://cran.r-project.org/package=DiPhiSeq.
Supplementary data are available at Bioinformatics online.
在 RNA-Seq 数据分析中,检测差异表达(DE)基因是近年来的一个热门研究领域,已经提出了许多方法。DE 基因在不同的样本组中表现出不同的平均表达水平,因此可以作为重要的生物学标志物。虽然这些方法通常非常成功,但需要进一步针对癌症数据进行定制和改进,因为癌症数据中的样本通常表现出相当大的表达多样性,而且这种多样性比对照组大得多。
我们提出了一种统计方法,不仅可以检测到平均表达水平不同的基因,还可以检测到不同组中表达多样性不同的基因。这些“差异分散”的基因可能是重要的临床标志物。我们的方法在拟似似然函数上使用了一种重新下降的惩罚,因此在存在异常值和其他噪声的情况下具有更好的鲁棒性。模拟和真实数据分析表明,DiPhiSeq 在存在异常值的情况下优于现有方法,并能识别出独特的基因集。
DiPhiSeq 作为一个 R 包在 CRAN 上公开可用:https://cran.r-project.org/package=DiPhiSeq。
补充数据可在生物信息学在线获得。