Suppr超能文献

基于自适应结构收缩的贝叶斯广义双聚类分析。

Bayesian generalized biclustering analysis via adaptive structured shrinkage.

机构信息

Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Road, NE, Atlanta, GA, USA.

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, USA.

出版信息

Biostatistics. 2020 Jul 1;21(3):610-624. doi: 10.1093/biostatistics/kxy081.

Abstract

Biclustering techniques can identify local patterns of a data matrix by clustering feature space and sample space at the same time. Various biclustering methods have been proposed and successfully applied to analysis of gene expression data. While existing biclustering methods have many desirable features, most of them are developed for continuous data and few of them can efficiently handle -omics data of various types, for example, binomial data as in single nucleotide polymorphism data or negative binomial data as in RNA-seq data. In addition, none of existing methods can utilize biological information such as those from functional genomics or proteomics. Recent work has shown that incorporating biological information can improve variable selection and prediction performance in analyses such as linear regression and multivariate analysis. In this article, we propose a novel Bayesian biclustering method that can handle multiple data types including Gaussian, Binomial, and Negative Binomial. In addition, our method uses a Bayesian adaptive structured shrinkage prior that enables feature selection guided by existing biological information. Our simulation studies and application to multi-omics datasets demonstrate robust and superior performance of the proposed method, compared to other existing biclustering methods.

摘要

双聚类技术可以通过同时对特征空间和样本空间进行聚类来识别数据矩阵的局部模式。已经提出了各种双聚类方法,并成功地应用于基因表达数据的分析。虽然现有的双聚类方法具有许多理想的特征,但它们大多是为连续数据开发的,很少有方法能够有效地处理各种类型的组学数据,例如单核苷酸多态性数据中的二项式数据或 RNA-seq 数据中的负二项式数据。此外,现有的方法都不能利用功能基因组学或蛋白质组学等生物学信息。最近的研究表明,在线性回归和多元分析等分析中,结合生物学信息可以提高变量选择和预测性能。在本文中,我们提出了一种新的贝叶斯双聚类方法,该方法可以处理包括高斯、二项式和负二项式在内的多种数据类型。此外,我们的方法使用了贝叶斯自适应结构化收缩先验,能够根据现有生物学信息进行特征选择。与其他现有的双聚类方法相比,我们的模拟研究和对多组学数据集的应用表明,所提出的方法具有稳健和优越的性能。

相似文献

1
Bayesian generalized biclustering analysis via adaptive structured shrinkage.
Biostatistics. 2020 Jul 1;21(3):610-624. doi: 10.1093/biostatistics/kxy081.
2
Robust knowledge-guided biclustering for multi-omics data.
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad446.
3
Knowledge-Guided Biclustering via Sparse Variational EM Algorithm.
10th IEEE Int Conf Big Knowl (2019). 2019 Nov;2019:25-32. doi: 10.1109/icbk.2019.00012. Epub 2019 Dec 30.
5
A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data.
Biostatistics. 2018 Jan 1;19(1):71-86. doi: 10.1093/biostatistics/kxx017.
6
Bayesian biclustering of gene expression data.
BMC Genomics. 2008;9 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2164-9-S1-S4.
8

引用本文的文献

2
A clustering approach to integrative analyses of multiomic cancer data.
J Appl Stat. 2024 Nov 29;52(8):1539-1560. doi: 10.1080/02664763.2024.2431742. eCollection 2025.
4
Knowledge-guided learning methods for integrative analysis of multi-omics data.
Comput Struct Biotechnol J. 2024 Apr 30;23:1945-1950. doi: 10.1016/j.csbj.2024.04.053. eCollection 2024 Dec.
5
Single-cell biclustering for cell-specific transcriptomic perturbation detection in AD progression.
Cell Rep Methods. 2024 Apr 22;4(4):100742. doi: 10.1016/j.crmeth.2024.100742. Epub 2024 Mar 29.
6
Robust knowledge-guided biclustering for multi-omics data.
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad446.
7
Integrative Learning of Structured High-Dimensional Data from Multiple Datasets.
Stat Anal Data Min. 2023 Apr;16(2):120-134. doi: 10.1002/sam.11601. Epub 2022 Nov 8.
8
Robust integrative biclustering for multi-view data.
Stat Methods Med Res. 2022 Nov;31(11):2201-2216. doi: 10.1177/09622802221122427. Epub 2022 Sep 13.
9
Current progress and open challenges for applying deep learning across the biosciences.
Nat Commun. 2022 Apr 1;13(1):1728. doi: 10.1038/s41467-022-29268-7.
10
Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology.
JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.

本文引用的文献

1
Scalable Bayesian variable selection for structured high-dimensional data.
Biometrics. 2018 Dec;74(4):1372-1382. doi: 10.1111/biom.12882. Epub 2018 May 8.
2
Incorporating biological information in sparse principal component analysis with application to genomic data.
BMC Bioinformatics. 2017 Jul 11;18(1):332. doi: 10.1186/s12859-017-1740-7.
5
Network-aided Bi-Clustering for discovering cancer subtypes.
Sci Rep. 2017 Apr 21;7(1):1046. doi: 10.1038/s41598-017-01064-0.
6
A systematic comparative evaluation of biclustering techniques.
BMC Bioinformatics. 2017 Jan 23;18(1):55. doi: 10.1186/s12859-017-1487-1.
7
PANTHER version 10: expanded protein families and functions, and analysis tools.
Nucleic Acids Res. 2016 Jan 4;44(D1):D336-42. doi: 10.1093/nar/gkv1194. Epub 2015 Nov 17.
8
Biclustering on expression data: A review.
J Biomed Inform. 2015 Oct;57:163-80. doi: 10.1016/j.jbi.2015.06.028. Epub 2015 Jul 6.
9
A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression.
BMC Bioinformatics. 2014 Feb 4;15:37. doi: 10.1186/1471-2105-15-37.
10
iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data.
Bioinformatics. 2013 Jan 15;29(2):149-59. doi: 10.1093/bioinformatics/bts655. Epub 2012 Nov 9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验