Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Road, NE, Atlanta, GA, USA.
Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, USA.
Biostatistics. 2020 Jul 1;21(3):610-624. doi: 10.1093/biostatistics/kxy081.
Biclustering techniques can identify local patterns of a data matrix by clustering feature space and sample space at the same time. Various biclustering methods have been proposed and successfully applied to analysis of gene expression data. While existing biclustering methods have many desirable features, most of them are developed for continuous data and few of them can efficiently handle -omics data of various types, for example, binomial data as in single nucleotide polymorphism data or negative binomial data as in RNA-seq data. In addition, none of existing methods can utilize biological information such as those from functional genomics or proteomics. Recent work has shown that incorporating biological information can improve variable selection and prediction performance in analyses such as linear regression and multivariate analysis. In this article, we propose a novel Bayesian biclustering method that can handle multiple data types including Gaussian, Binomial, and Negative Binomial. In addition, our method uses a Bayesian adaptive structured shrinkage prior that enables feature selection guided by existing biological information. Our simulation studies and application to multi-omics datasets demonstrate robust and superior performance of the proposed method, compared to other existing biclustering methods.
双聚类技术可以通过同时对特征空间和样本空间进行聚类来识别数据矩阵的局部模式。已经提出了各种双聚类方法,并成功地应用于基因表达数据的分析。虽然现有的双聚类方法具有许多理想的特征,但它们大多是为连续数据开发的,很少有方法能够有效地处理各种类型的组学数据,例如单核苷酸多态性数据中的二项式数据或 RNA-seq 数据中的负二项式数据。此外,现有的方法都不能利用功能基因组学或蛋白质组学等生物学信息。最近的研究表明,在线性回归和多元分析等分析中,结合生物学信息可以提高变量选择和预测性能。在本文中,我们提出了一种新的贝叶斯双聚类方法,该方法可以处理包括高斯、二项式和负二项式在内的多种数据类型。此外,我们的方法使用了贝叶斯自适应结构化收缩先验,能够根据现有生物学信息进行特征选择。与其他现有的双聚类方法相比,我们的模拟研究和对多组学数据集的应用表明,所提出的方法具有稳健和优越的性能。