贝叶斯混合模型的一致性聚类。

Consensus clustering for Bayesian mixture models.

机构信息

MRC Biostatistics Unit, University of Cambridge, Cambridge, UK.

Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, UK.

出版信息

BMC Bioinformatics. 2022 Jul 21;23(1):290. doi: 10.1186/s12859-022-04830-8.

DOI:10.1186/s12859-022-04830-8

PMID:35864476

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9306175/

Abstract

BACKGROUND

Cluster analysis is an integral part of precision medicine and systems biology, used to define groups of patients or biomolecules. Consensus clustering is an ensemble approach that is widely used in these areas, which combines the output from multiple runs of a non-deterministic clustering algorithm. Here we consider the application of consensus clustering to a broad class of heuristic clustering algorithms that can be derived from Bayesian mixture models (and extensions thereof) by adopting an early stopping criterion when performing sampling-based inference for these models. While the resulting approach is non-Bayesian, it inherits the usual benefits of consensus clustering, particularly in terms of computational scalability and providing assessments of clustering stability/robustness.

RESULTS

In simulation studies, we show that our approach can successfully uncover the target clustering structure, while also exploring different plausible clusterings of the data. We show that, when a parallel computation environment is available, our approach offers significant reductions in runtime compared to performing sampling-based Bayesian inference for the underlying model, while retaining many of the practical benefits of the Bayesian approach, such as exploring different numbers of clusters. We propose a heuristic to decide upon ensemble size and the early stopping criterion, and then apply consensus clustering to a clustering algorithm derived from a Bayesian integrative clustering method. We use the resulting approach to perform an integrative analysis of three 'omics datasets for budding yeast and find clusters of co-expressed genes with shared regulatory proteins. We validate these clusters using data external to the analysis.

CONCLUSTIONS

Our approach can be used as a wrapper for essentially any existing sampling-based Bayesian clustering implementation, and enables meaningful clustering analyses to be performed using such implementations, even when computational Bayesian inference is not feasible, e.g. due to poor exploration of the target density (often as a result of increasing numbers of features) or a limited computational budget that does not along sufficient samples to drawn from a single chain. This enables researchers to straightforwardly extend the applicability of existing software to much larger datasets, including implementations of sophisticated models such as those that jointly model multiple datasets.

摘要

背景

聚类分析是精准医学和系统生物学的一个组成部分，用于定义患者或生物分子群体。共识聚类是一种广泛应用于这些领域的集成方法，它结合了多个非确定性聚类算法运行的输出结果。在这里，我们考虑将共识聚类应用于广泛的启发式聚类算法类别，这些算法可以通过在对这些模型进行基于采样的推断时采用早期停止标准，从贝叶斯混合模型（及其扩展）中导出。虽然得到的方法是非贝叶斯的，但它继承了共识聚类的通常好处，特别是在计算可扩展性方面，并提供了聚类稳定性/稳健性的评估。

结果

在模拟研究中，我们表明我们的方法可以成功地揭示目标聚类结构，同时也探索了数据的不同可能聚类。我们表明，当有并行计算环境时，与对基础模型进行基于采样的贝叶斯推断相比，我们的方法提供了显著的运行时减少，同时保留了贝叶斯方法的许多实际好处，例如探索不同数量的聚类。我们提出了一种启发式方法来决定集成大小和早期停止标准，然后将共识聚类应用于从贝叶斯综合聚类方法导出的聚类算法。我们使用所得方法对芽殖酵母的三个“组学”数据集进行综合分析，找到具有共享调控蛋白的共表达基因簇。我们使用分析之外的数据验证了这些簇。

结论

我们的方法可以用作基本上任何现有基于采样的贝叶斯聚类实现的包装器，并能够使用此类实现进行有意义的聚类分析，即使计算贝叶斯推断不可行，例如由于目标密度的探索不佳（通常是由于特征数量增加）或计算预算有限，无法从单个链中抽取足够的样本。这使研究人员能够直接将现有软件的适用性扩展到更大的数据集，包括联合建模多个数据集的复杂模型的实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4128/9306175/882b7c64e8bf/12859_2022_4830_Fig1_HTML.jpg

相似文献

Consensus clustering for Bayesian mixture models.贝叶斯混合模型的一致性聚类。

BMC Bioinformatics. 2022 Jul 21;23(1):290. doi: 10.1186/s12859-022-04830-8.

Bayesian consensus clustering for multivariate longitudinal data.贝叶斯共识聚类分析多元纵向数据。

Stat Med. 2022 Jan 15;41(1):108-127. doi: 10.1002/sim.9225. Epub 2021 Oct 20.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Bayesian mixture model based clustering of replicated microarray data.基于贝叶斯混合模型的重复微阵列数据聚类

Bioinformatics. 2004 May 22;20(8):1222-32. doi: 10.1093/bioinformatics/bth068. Epub 2004 Feb 10.

A novel approach for clustering proteomics data using Bayesian fast Fourier transform.一种使用贝叶斯快速傅里叶变换对蛋白质组学数据进行聚类的新方法。

Bioinformatics. 2005 May 15;21(10):2210-24. doi: 10.1093/bioinformatics/bti383. Epub 2005 Mar 15.

clusterBMA: Bayesian model averaging for clustering.聚类 BMA：用于聚类的贝叶斯模型平均。

PLoS One. 2023 Aug 21;18(8):e0288000. doi: 10.1371/journal.pone.0288000. eCollection 2023.

Resolving the structure of interactomes with hierarchical agglomerative clustering.利用层次凝聚聚类解析互作组学结构。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S44. doi: 10.1186/1471-2105-12-S1-S44.

Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.聚类组学：针对异构数据集的整合上下文相关聚类

PLoS Comput Biol. 2017 Oct 16;13(10):e1005781. doi: 10.1371/journal.pcbi.1005781. eCollection 2017 Oct.

Bayesian consensus clustering.贝叶斯共识聚类。

Bioinformatics. 2013 Oct 15;29(20):2610-6. doi: 10.1093/bioinformatics/btt425. Epub 2013 Aug 28.

Fast and interpretable consensus clustering via minipatch learning.通过微块学习实现快速且可解释的共识聚类。

PLoS Comput Biol. 2022 Oct 3;18(10):e1010577. doi: 10.1371/journal.pcbi.1010577. eCollection 2022 Oct.

引用本文的文献

Identification and immunological characterization of cuproptosis-related molecular clusters in cardioembolic stroke.心源性栓塞性卒中中铜死亡相关分子簇的鉴定及免疫特征分析

Medicine (Baltimore). 2025 Aug 15;104(33):e43747. doi: 10.1097/MD.0000000000043747.

The aggrephagy-related gene TUBA1B influences clinical outcomes in glioma patients by regulating the cell cycle.与聚集性自噬相关的基因TUBA1B通过调节细胞周期影响胶质瘤患者的临床预后。

Front Oncol. 2025 Feb 28;15:1531465. doi: 10.3389/fonc.2025.1531465. eCollection 2025.

NDUFA11 may be the disulfidptosis-related biomarker of ischemic stroke based on integrated bioinformatics, clinical samples, and experimental analyses.基于综合生物信息学、临床样本和实验分析，NDUFA11可能是缺血性中风的二硫化物诱导细胞焦亡相关生物标志物。

Front Neurosci. 2025 Jan 14;18:1505493. doi: 10.3389/fnins.2024.1505493. eCollection 2024.

Machine Learning-based Framework Develops a Tumor Thrombus Coagulation Signature in Multicenter Cohorts for Renal Cancer.基于机器学习的框架在多中心队列中为肾癌开发肿瘤血栓形成特征。

Int J Biol Sci. 2024 Jul 1;20(9):3590-3620. doi: 10.7150/ijbs.94555. eCollection 2024.

Identification of disulfidptosis-associated genes and characterization of immune cell infiltration in thyroid carcinoma.鉴定甲状腺癌中二硫键过氧化物酶相关基因并分析免疫细胞浸润特征。

Aging (Albany NY). 2024 Jun 4;16(11):9753-9783. doi: 10.18632/aging.205897.

Identification of cuproptosis-related gene clusters and immune cell infiltration in major burns based on machine learning models and experimental validation.基于机器学习模型和实验验证的重大烧伤中铜死亡相关基因簇和免疫细胞浸润的鉴定。

Front Immunol. 2024 Feb 12;15:1335675. doi: 10.3389/fimmu.2024.1335675. eCollection 2024.

Development and implementation of a prognostic model for clear cell renal cell carcinoma based on heterogeneous TLR4 expression.基于异质性TLR4表达的透明细胞肾细胞癌预后模型的开发与应用

Heliyon. 2024 Feb 12;10(4):e25571. doi: 10.1016/j.heliyon.2024.e25571. eCollection 2024 Feb 29.

Identification of a novel macrophage-related prognostic signature in colorectal cancer.鉴定结直肠癌中一种新的巨噬细胞相关预后特征。

Sci Rep. 2024 Feb 2;14(1):2767. doi: 10.1038/s41598-024-53207-9.

Identification and validation of SLCO4C1 as a biological marker in hepatocellular carcinoma based on anoikis classification features.基于失巢凋亡分类特征鉴定和验证SLCO4C1作为肝细胞癌的生物标志物

Aging (Albany NY). 2024 Jan 15;16(2):1440-1462. doi: 10.18632/aging.205438.

Identification of disulfidptosis-related subtypes, characterization of tumor microenvironment infiltration, and development of a prognosis model in breast cancer.鉴定乳腺癌中二硫键错配相关亚型，分析肿瘤微环境浸润特征，建立预后模型。

Front Immunol. 2023 Nov 15;14:1198826. doi: 10.3389/fimmu.2023.1198826. eCollection 2023.

本文引用的文献

Consensus Monte Carlo for Random Subsets using Shared Anchors.使用共享锚点的随机子集的共识蒙特卡罗方法。

J Comput Graph Stat. 2020;29(4):703-714. doi: 10.1080/10618600.2020.1737085. Epub 2020 Apr 15.

cola: an R/Bioconductor package for consensus partitioning through a general framework.cola：一个通过通用框架进行共识分割的 R/Bioconductor 包。

Nucleic Acids Res. 2021 Feb 22;49(3):e15. doi: 10.1093/nar/gkaa1146.

Principles of Bayesian Inference Using General Divergence Criteria.使用一般散度准则的贝叶斯推断原理。

Entropy (Basel). 2018 Jun 6;20(6):442. doi: 10.3390/e20060442.

Scalable Bayesian Nonparametric Clustering and Classification.可扩展的贝叶斯非参数聚类与分类

J Comput Graph Stat. 2020;29(1):53-65. doi: 10.1080/10618600.2019.1624366. Epub 2019 Jul 19.

M3C: Monte Carlo reference-based consensus clustering.M3C：基于蒙特卡罗模拟的共识聚类。

Sci Rep. 2020 Feb 4;10(1):1816. doi: 10.1038/s41598-020-58766-1.

GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution.GPseudoClust：单细胞分辨率下共享伪轮廓的去卷积。

Bioinformatics. 2020 Mar 1;36(5):1484-1491. doi: 10.1093/bioinformatics/btz778.

A Bayesian mixture modelling approach for spatial proteomics.贝叶斯混合建模方法在空间蛋白质组学中的应用。

PLoS Comput Biol. 2018 Nov 27;14(11):e1006516. doi: 10.1371/journal.pcbi.1006516. eCollection 2018 Nov.

The Yeast DNA Damage Checkpoint Kinase Rad53 Targets the Exoribonuclease, Xrn1.酵母DNA损伤检查点激酶Rad53靶向核糖核酸外切酶Xrn1。

G3 (Bethesda). 2018 Dec 10;8(12):3931-3944. doi: 10.1534/g3.118.200767.

Accelerating MCMC algorithms.加速马尔可夫链蒙特卡罗算法。

Wiley Interdiscip Rev Comput Stat. 2018 Sep-Oct;10(5):e1435. doi: 10.1002/wics.1435. Epub 2018 Jun 13.

Fast Large-Scale Spectral Clustering via Explicit Feature Mapping.通过显式特征映射实现快速大规模谱聚类。

IEEE Trans Cybern. 2019 Mar;49(3):1058-1071. doi: 10.1109/TCYB.2018.2794998. Epub 2018 Feb 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

贝叶斯混合模型的一致性聚类。

Consensus clustering for Bayesian mixture models.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSTIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献