Suppr超能文献

DiPhiSeq:在具有大样本量的 RNA-Seq 数据上进行稳健的表达水平比较。

DiPhiSeq: robust comparison of expression levels on RNA-Seq data with large sample sizes.

机构信息

Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA.

Mathematics Department, Bryant University, Smithfield, RI, USA.

出版信息

Bioinformatics. 2019 Jul 1;35(13):2235-2242. doi: 10.1093/bioinformatics/bty952.

Abstract

MOTIVATION

In the analysis of RNA-Seq data, detecting differentially expressed (DE) genes has been a hot research area in recent years and many methods have been proposed. DE genes show different average expression levels in different sample groups, and thus can be important biological markers. While generally very successful, these methods need to be further tailored and improved for cancerous data, which often features quite diverse expression in the samples from the cancer group, and this diversity is much larger than that in the control group.

RESULTS

We propose a statistical method that can detect not only genes that show different average expressions, but also genes that show different diversities of expressions in different groups. These 'differentially dispersed' genes can be important clinical markers. Our method uses a redescending penalty on the quasi-likelihood function, and thus has superior robustness against outliers and other noise. Simulations and real data analysis demonstrate that DiPhiSeq outperforms existing methods in the presence of outliers, and identifies unique sets of genes.

AVAILABILITY AND IMPLEMENTATION

DiPhiSeq is publicly available as an R package on CRAN: https://cran.r-project.org/package=DiPhiSeq.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在 RNA-Seq 数据分析中,检测差异表达(DE)基因是近年来的一个热门研究领域,已经提出了许多方法。DE 基因在不同的样本组中表现出不同的平均表达水平,因此可以作为重要的生物学标志物。虽然这些方法通常非常成功,但需要进一步针对癌症数据进行定制和改进,因为癌症数据中的样本通常表现出相当大的表达多样性,而且这种多样性比对照组大得多。

结果

我们提出了一种统计方法,不仅可以检测到平均表达水平不同的基因,还可以检测到不同组中表达多样性不同的基因。这些“差异分散”的基因可能是重要的临床标志物。我们的方法在拟似似然函数上使用了一种重新下降的惩罚,因此在存在异常值和其他噪声的情况下具有更好的鲁棒性。模拟和真实数据分析表明,DiPhiSeq 在存在异常值的情况下优于现有方法,并能识别出独特的基因集。

可用性和实现

DiPhiSeq 作为一个 R 包在 CRAN 上公开可用:https://cran.r-project.org/package=DiPhiSeq。

补充信息

补充数据可在生物信息学在线获得。

相似文献

本文引用的文献

1
Cancer statistics, 2016.癌症统计数据,2016 年。
CA Cancer J Clin. 2016 Jan-Feb;66(1):7-30. doi: 10.3322/caac.21332. Epub 2016 Jan 7.
8
RCDB: Renal Cancer Gene Database.RCDB:肾癌基因数据库。
BMC Res Notes. 2012 May 18;5:246. doi: 10.1186/1756-0500-5-246.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验