Department of Bioinformatics and Computational Biology, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA.
Bioinformatics. 2013 Mar 1;29(5):605-13. doi: 10.1093/bioinformatics/bts713. Epub 2013 Jan 9.
Identification of bimodally expressed genes is an important task, as genes with bimodal expression play important roles in cell differentiation, signalling and disease progression. Several useful algorithms have been developed to identify bimodal genes from microarray data. Currently, no method can deal with data from next-generation sequencing, which is emerging as a replacement technology for microarrays.
We present SIBER (systematic identification of bimodally expressed genes using RNAseq data) for effectively identifying bimodally expressed genes from next-generation RNAseq data. We evaluate several candidate methods for modelling RNAseq count data and compare their performance in identifying bimodal genes through both simulation and real data analysis. We show that the lognormal mixture model performs best in terms of power and robustness under various scenarios. We also compare our method with alternative approaches, including profile analysis using clustering and kurtosis (PACK) and cancer outlier profile analysis (COPA). Our method is robust, powerful, invariant to shifting and scaling, has no blind spots and has a sample-size-free interpretation.
The R package SIBER is available at the website http://bioinformatics.mdanderson.org/main/OOMPA:Overview.
鉴定双峰表达基因是一项重要的任务,因为双峰表达的基因在细胞分化、信号转导和疾病进展中起着重要作用。已经开发了几种有用的算法来从微阵列数据中鉴定双峰基因。目前,尚无方法可以处理下一代测序数据,下一代测序正逐渐取代微阵列技术。
我们提出了 SIBER(使用 RNAseq 数据进行双峰表达基因的系统鉴定),用于有效地从下一代 RNAseq 数据中鉴定双峰表达基因。我们评估了几种用于对 RNAseq 计数数据进行建模的候选方法,并通过模拟和真实数据分析比较了它们在鉴定双峰基因方面的性能。我们表明,对数正态混合模型在各种情况下的功效和稳健性方面表现最佳。我们还将我们的方法与替代方法进行了比较,包括使用聚类和峰度的轮廓分析(PACK)和癌症异常值轮廓分析(COPA)。我们的方法具有稳健性、强大性、对偏移和缩放不变性、无盲点且具有样本量无关的解释。
R 包 SIBER 可在网站 http://bioinformatics.mdanderson.org/main/OOMPA:Overview 上获得。