基于自适应结构收缩的贝叶斯广义双聚类分析。

Bayesian generalized biclustering analysis via adaptive structured shrinkage.

机构信息

Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Road, NE, Atlanta, GA, USA.

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, USA.

出版信息

Biostatistics. 2020 Jul 1;21(3):610-624. doi: 10.1093/biostatistics/kxy081.

DOI:10.1093/biostatistics/kxy081

PMID:30596887

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7307984/

Abstract

Biclustering techniques can identify local patterns of a data matrix by clustering feature space and sample space at the same time. Various biclustering methods have been proposed and successfully applied to analysis of gene expression data. While existing biclustering methods have many desirable features, most of them are developed for continuous data and few of them can efficiently handle -omics data of various types, for example, binomial data as in single nucleotide polymorphism data or negative binomial data as in RNA-seq data. In addition, none of existing methods can utilize biological information such as those from functional genomics or proteomics. Recent work has shown that incorporating biological information can improve variable selection and prediction performance in analyses such as linear regression and multivariate analysis. In this article, we propose a novel Bayesian biclustering method that can handle multiple data types including Gaussian, Binomial, and Negative Binomial. In addition, our method uses a Bayesian adaptive structured shrinkage prior that enables feature selection guided by existing biological information. Our simulation studies and application to multi-omics datasets demonstrate robust and superior performance of the proposed method, compared to other existing biclustering methods.

摘要

双聚类技术可以通过同时对特征空间和样本空间进行聚类来识别数据矩阵的局部模式。已经提出了各种双聚类方法，并成功地应用于基因表达数据的分析。虽然现有的双聚类方法具有许多理想的特征，但它们大多是为连续数据开发的，很少有方法能够有效地处理各种类型的组学数据，例如单核苷酸多态性数据中的二项式数据或 RNA-seq 数据中的负二项式数据。此外，现有的方法都不能利用功能基因组学或蛋白质组学等生物学信息。最近的研究表明，在线性回归和多元分析等分析中，结合生物学信息可以提高变量选择和预测性能。在本文中，我们提出了一种新的贝叶斯双聚类方法，该方法可以处理包括高斯、二项式和负二项式在内的多种数据类型。此外，我们的方法使用了贝叶斯自适应结构化收缩先验，能够根据现有生物学信息进行特征选择。与其他现有的双聚类方法相比，我们的模拟研究和对多组学数据集的应用表明，所提出的方法具有稳健和优越的性能。

相似文献

Bayesian generalized biclustering analysis via adaptive structured shrinkage.基于自适应结构收缩的贝叶斯广义双聚类分析。

Biostatistics. 2020 Jul 1;21(3):610-624. doi: 10.1093/biostatistics/kxy081.

Robust knowledge-guided biclustering for multi-omics data.基于稳健知识引导的多组学数据双聚类分析。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad446.

Knowledge-Guided Biclustering via Sparse Variational EM Algorithm.基于稀疏变分期望最大化算法的知识引导双聚类

10th IEEE Int Conf Big Knowl (2019). 2019 Nov;2019:25-32. doi: 10.1109/icbk.2019.00012. Epub 2019 Dec 30.

Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species.双 EB：用于物种间多组学数据整合模式识别的经验贝叶斯双聚类

Genes (Basel). 2022 Oct 30;13(11):1982. doi: 10.3390/genes13111982.

A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data.一种用于多类型组学数据综合聚类分析的全贝叶斯潜在变量模型。

Biostatistics. 2018 Jan 1;19(1):71-86. doi: 10.1093/biostatistics/kxx017.

Bayesian biclustering of gene expression data.基因表达数据的贝叶斯双聚类分析

BMC Genomics. 2008;9 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2164-9-S1-S4.

Incorporating graph information in Bayesian factor analysis with robust and adaptive shrinkage priors.在具有稳健和自适应收缩先验的贝叶斯因子分析中纳入图信息。

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad014.

A Bayesian framework for pathway-guided identification of cancer subgroups by integrating multiple types of genomic data.基于贝叶斯框架，通过整合多种类型的基因组数据，对癌症亚组进行通路指导的识别。

Stat Med. 2023 Dec 10;42(28):5266-5284. doi: 10.1002/sim.9911. Epub 2023 Sep 15.

A novel non-negative Bayesian stacking modeling method for Cancer survival prediction using high-dimensional omics data.一种使用高维组学数据进行癌症生存预测的新型非负贝叶斯堆叠建模方法。

BMC Med Res Methodol. 2024 May 3;24(1):105. doi: 10.1186/s12874-024-02232-3.

NetMIM: network-based multi-omics integration with block missingness for biomarker selection and disease outcome prediction.NetMIM：基于网络的多组学整合，具有块缺失，用于生物标志物选择和疾病结果预测。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae454.

引用本文的文献

Graph-guided Bayesian Factor Model for Integrative Analysis of Multi-modal Data with Noisy Network Information.基于图引导的贝叶斯因子模型用于含噪声网络信息的多模态数据综合分析

Stat Biosci. 2024 Aug 11. doi: 10.1007/s12561-024-09452-7.

A clustering approach to integrative analyses of multiomic cancer data.一种用于多组学癌症数据综合分析的聚类方法。

J Appl Stat. 2024 Nov 29;52(8):1539-1560. doi: 10.1080/02664763.2024.2431742. eCollection 2025.

Protocol for analyzing functional gene module perturbation during the progression of diseases using a single-cell Bayesian biclustering framework.使用单细胞贝叶斯双聚类框架分析疾病进展过程中功能基因模块扰动的方案。

STAR Protoc. 2024 Dec 20;5(4):103349. doi: 10.1016/j.xpro.2024.103349. Epub 2024 Sep 30.

Knowledge-guided learning methods for integrative analysis of multi-omics data.用于多组学数据综合分析的知识引导学习方法。

Comput Struct Biotechnol J. 2024 Apr 30;23:1945-1950. doi: 10.1016/j.csbj.2024.04.053. eCollection 2024 Dec.

Single-cell biclustering for cell-specific transcriptomic perturbation detection in AD progression.用于检测阿尔茨海默病进展中细胞特异性转录组扰动的单细胞双聚类分析

Cell Rep Methods. 2024 Apr 22;4(4):100742. doi: 10.1016/j.crmeth.2024.100742. Epub 2024 Mar 29.

Robust knowledge-guided biclustering for multi-omics data.基于稳健知识引导的多组学数据双聚类分析。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad446.

Integrative Learning of Structured High-Dimensional Data from Multiple Datasets.从多个数据集对结构化高维数据进行整合学习。

Stat Anal Data Min. 2023 Apr;16(2):120-134. doi: 10.1002/sam.11601. Epub 2022 Nov 8.

Robust integrative biclustering for multi-view data.多视图数据的稳健集成双聚类。

Stat Methods Med Res. 2022 Nov;31(11):2201-2216. doi: 10.1177/09622802221122427. Epub 2022 Sep 13.

Current progress and open challenges for applying deep learning across the biosciences.深度学习在整个生命科学中的应用现状及面临的开放性挑战。

Nat Commun. 2022 Apr 1;13(1):1728. doi: 10.1038/s41467-022-29268-7.

Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology.用于精准肿瘤学中高维组学数据分析的知识引导统计学习方法

JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.

本文引用的文献

Scalable Bayesian variable selection for structured high-dimensional data.用于结构化高维数据的可扩展贝叶斯变量选择

Biometrics. 2018 Dec;74(4):1372-1382. doi: 10.1111/biom.12882. Epub 2018 May 8.

Incorporating biological information in sparse principal component analysis with application to genomic data.将生物信息纳入稀疏主成分分析并应用于基因组数据。

BMC Bioinformatics. 2017 Jul 11;18(1):332. doi: 10.1186/s12859-017-1740-7.

Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information.通过结合生物信息的稀疏典型相关分析对转录组学和代谢组学数据进行综合分析。

Biometrics. 2018 Mar;74(1):300-312. doi: 10.1111/biom.12715. Epub 2017 May 8.

Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence.结合已知和新生物信息的分层特征选择：识别与前列腺癌复发相关的基因组特征。

J Am Stat Assoc. 2016;111(516):1427-1439. doi: 10.1080/01621459.2016.1164051. Epub 2017 Jan 4.

Network-aided Bi-Clustering for discovering cancer subtypes.基于网络的双聚类分析用于发现癌症亚型。

Sci Rep. 2017 Apr 21;7(1):1046. doi: 10.1038/s41598-017-01064-0.

A systematic comparative evaluation of biclustering techniques.双聚类技术的系统比较评估

BMC Bioinformatics. 2017 Jan 23;18(1):55. doi: 10.1186/s12859-017-1487-1.

PANTHER version 10: expanded protein families and functions, and analysis tools.PANTHER 版本 10：扩展的蛋白质家族与功能以及分析工具。

Nucleic Acids Res. 2016 Jan 4;44(D1):D336-42. doi: 10.1093/nar/gkv1194. Epub 2015 Nov 17.

Biclustering on expression data: A review.基于表达数据的双聚类分析：综述

J Biomed Inform. 2015 Oct;57:163-80. doi: 10.1016/j.jbi.2015.06.028. Epub 2015 Jul 6.

A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression.基于基因表达的网络辅助协同聚类算法发现癌症亚型。

BMC Bioinformatics. 2014 Feb 4;15:37. doi: 10.1186/1471-2105-15-37.

iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data.iBAG：高维多组学基因组数据的综合贝叶斯分析。

Bioinformatics. 2013 Jan 15;29(2):149-59. doi: 10.1093/bioinformatics/bts655. Epub 2012 Nov 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验