Graduate School of Agricultural and Life Sciences, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo, 113-8657, Japan.
Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo, 113-8657, Japan.
BMC Bioinformatics. 2021 Oct 20;22(1):511. doi: 10.1186/s12859-021-04438-4.
RNA-seq is a tool for measuring gene expression and is commonly used to identify differentially expressed genes (DEGs). Gene clustering is used to classify DEGs with similar expression patterns for the subsequent analyses of data from experiments such as time-courses or multi-group comparisons. However, gene clustering has rarely been used for analyzing simple two-group data or differential expression (DE). In this study, we report that a model-based clustering algorithm implemented in an R package, MBCluster.Seq, can also be used for DE analysis.
The input data originally used by MBCluster.Seq is DEGs, and the proposed method (called MBCdeg) uses all genes for the analysis. The method uses posterior probabilities of genes assigned to a cluster displaying non-DEG pattern for overall gene ranking. We compared the performance of MBCdeg with conventional R packages such as edgeR, DESeq2, and TCC that are specialized for DE analysis using simulated and real data. Our results showed that MBCdeg outperformed other methods when the proportion of DEG (P) was less than 50%. However, the DEG identification using MBCdeg was less consistent than with conventional methods. We compared the effects of different normalization algorithms using MBCdeg, and performed an analysis using MBCdeg in combination with a robust normalization algorithm (called DEGES) that was not implemented in MBCluster.Seq. The new analysis method showed greater stability than using the original MBCdeg with the default normalization algorithm.
MBCdeg with DEGES normalization can be used in the identification of DEGs when the P is relatively low. As the method is based on gene clustering, the DE result includes information on which expression pattern the gene belongs to. The new method may be useful for the analysis of time-course and multi-group data, where the classification of expression patterns is often required.
RNA-seq 是一种测量基因表达的工具,常用于鉴定差异表达基因(DEGs)。基因聚类用于对具有相似表达模式的 DEGs 进行分类,以便对实验数据(如时间序列或多组比较)进行后续分析。然而,基因聚类很少用于分析简单的两组数据或差异表达(DE)。在本研究中,我们报告了一种基于模型的聚类算法,该算法在 R 包 MBCluster.Seq 中实现,也可用于 DE 分析。
MBCluster.Seq 最初输入的数据是 DEGs,而我们提出的方法(称为 MBCdeg)则使用所有基因进行分析。该方法使用基因被分配到显示非 DEG 模式的聚类的后验概率对所有基因进行排名。我们使用模拟和真实数据,比较了 MBCdeg 与专门用于 DE 分析的常规 R 包 edgeR、DESeq2 和 TCC 的性能。我们的结果表明,当 DEG 的比例(P)小于 50%时,MBCdeg 的性能优于其他方法。然而,MBCdeg 的 DEG 鉴定结果不如常规方法一致。我们比较了使用 MBCdeg 的不同归一化算法的效果,并使用未在 MBCluster.Seq 中实现的稳健归一化算法(称为 DEGES)与 MBCdeg 联合进行了分析。新的分析方法比使用原始 MBCdeg 和默认归一化算法更稳定。
当 P 相对较低时,使用 DEGES 归一化的 MBCdeg 可用于 DEGs 的鉴定。由于该方法基于基因聚类,DE 结果包含基因所属的表达模式信息。该新方法可能对时间序列和多组数据的分析有用,其中通常需要对表达模式进行分类。