• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

高斯混合 Copulas 用于高维聚类和基于依赖关系的亚型划分。

Gaussian mixture copulas for high-dimensional clustering and dependency-based subtyping.

机构信息

Department of Information Systems and Analytics, School of Computing, National University of Singapore, 117418 Singapore.

TCS Innovation Labs, Kolkata 700156, India.

出版信息

Bioinformatics. 2020 Jan 15;36(2):621-628. doi: 10.1093/bioinformatics/btz599.

DOI:10.1093/bioinformatics/btz599
PMID:31368480
Abstract

MOTIVATION

The identification of sub-populations of patients with similar characteristics, called patient subtyping, is important for realizing the goals of precision medicine. Accurate subtyping is crucial for tailoring therapeutic strategies that can potentially lead to reduced mortality and morbidity. Model-based clustering, such as Gaussian mixture models, provides a principled and interpretable methodology that is widely used to identify subtypes. However, they impose identical marginal distributions on each variable; such assumptions restrict their modeling flexibility and deteriorates clustering performance.

RESULTS

In this paper, we use the statistical framework of copulas to decouple the modeling of marginals from the dependencies between them. Current copula-based methods cannot scale to high dimensions due to challenges in parameter inference. We develop HD-GMCM, that addresses these challenges and, to our knowledge, is the first copula-based clustering method that can fit high-dimensional data. Our experiments on real high-dimensional gene-expression and clinical datasets show that HD-GMCM outperforms state-of-the-art model-based clustering methods, by virtue of modeling non-Gaussian data and being robust to outliers through the use of Gaussian mixture copulas. We present a case study on lung cancer data from TCGA. Clusters obtained from HD-GMCM can be interpreted based on the dependencies they model, that offers a new way of characterizing subtypes. Empirically, such modeling not only uncovers latent structure that leads to better clustering but also meaningful clinical subtypes in terms of survival rates of patients.

AVAILABILITY AND IMPLEMENTATION

An implementation of HD-GMCM in R is available at: https://bitbucket.org/cdal/hdgmcm/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

识别具有相似特征的患者亚群,称为患者分型,对于实现精准医学的目标非常重要。准确的分型对于定制治疗策略至关重要,这些策略有可能降低死亡率和发病率。基于模型的聚类,如高斯混合模型,提供了一种广泛用于识别亚类的有原则且可解释的方法。然而,它们对每个变量施加相同的边缘分布;这种假设限制了它们的建模灵活性并降低了聚类性能。

结果

在本文中,我们使用 Copula 的统计框架来解耦边缘建模和它们之间的依赖性。由于参数推断方面的挑战,当前基于 Copula 的方法无法扩展到高维。我们开发了 HD-GMCM,它解决了这些挑战,并且据我们所知,是第一个能够拟合高维数据的基于 Copula 的聚类方法。我们在真实的高维基因表达和临床数据集上的实验表明,HD-GMCM 通过对非高斯数据进行建模以及通过使用高斯混合 Copula 对异常值进行稳健处理,优于最先进的基于模型的聚类方法。我们在 TCGA 的肺癌数据上进行了案例研究。从 HD-GMCM 获得的聚类可以根据它们所建模的依赖关系进行解释,这为描述亚类提供了一种新方法。从经验上看,这种建模不仅揭示了导致更好聚类的潜在结构,而且还揭示了患者生存率方面的有意义的临床亚类。

可用性和实现

HD-GMCM 的 R 实现可在 https://bitbucket.org/cdal/hdgmcm/ 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
Gaussian mixture copulas for high-dimensional clustering and dependency-based subtyping.高斯混合 Copulas 用于高维聚类和基于依赖关系的亚型划分。
Bioinformatics. 2020 Jan 15;36(2):621-628. doi: 10.1093/bioinformatics/btz599.
2
Weighted dimensionality reduction and robust Gaussian mixture model based cancer patient subtyping from gene expression data.基于加权降维和鲁棒高斯混合模型的基因表达数据癌症患者亚型分析。
J Biomed Inform. 2020 Dec;112:103620. doi: 10.1016/j.jbi.2020.103620. Epub 2020 Nov 11.
3
Robust clustering of noisy high-dimensional gene expression data for patients subtyping.对噪声高维基因表达数据进行稳健聚类,以对患者进行亚型划分。
Bioinformatics. 2018 Dec 1;34(23):4064-4072. doi: 10.1093/bioinformatics/bty502.
4
A Bayesian two-way latent structure model for genomic data integration reveals few pan-genomic cluster subtypes in a breast cancer cohort.贝叶斯双向潜在结构模型用于基因组数据整合,揭示乳腺癌队列中很少有泛基因组聚类亚型。
Bioinformatics. 2019 Dec 1;35(23):4886-4897. doi: 10.1093/bioinformatics/btz381.
5
Robust correlation estimation and UMAP assisted topological analysis of omics data for disease subtyping.用于疾病亚型分析的组学数据的稳健相关性估计和UMAP辅助拓扑分析。
Comput Biol Med. 2023 Mar;155:106640. doi: 10.1016/j.compbiomed.2023.106640. Epub 2023 Feb 8.
6
Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data.亚型生成对抗网络(Subtype-GAN):一种用于多组学数据综合癌症亚型分析的深度学习方法。
Bioinformatics. 2021 Aug 25;37(16):2231-2237. doi: 10.1093/bioinformatics/btab109.
7
Novel pruning and truncating of the mixture of vine copula clustering models.新型修剪和截断的混合vine copula 聚类模型。
Sci Rep. 2022 Nov 17;12(1):19815. doi: 10.1038/s41598-022-24274-7.
8
R-vine models for spatial time series with an application to daily mean temperature.用于空间时间序列的R-vine模型及其在日平均温度中的应用
Biometrics. 2015 Jun;71(2):323-32. doi: 10.1111/biom.12279. Epub 2015 Feb 6.
9
Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering.基于生存的贝叶斯聚类,探索更具临床相关性的患者异质性解剖。
Bioinformatics. 2017 Nov 15;33(22):3558-3566. doi: 10.1093/bioinformatics/btx464.
10
A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data.一种用于对来自独立高斯分布和贝塔分布数据的基因进行聚类的联合有限混合模型。
BMC Bioinformatics. 2009 May 29;10:165. doi: 10.1186/1471-2105-10-165.

引用本文的文献

1
RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data.基于正则化 copula 的单细胞 RNA-seq 数据基因选择方法。
PLoS Comput Biol. 2021 Oct 19;17(10):e1009464. doi: 10.1371/journal.pcbi.1009464. eCollection 2021 Oct.