• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

贝叶斯双向潜在结构模型用于基因组数据整合,揭示乳腺癌队列中很少有泛基因组聚类亚型。

A Bayesian two-way latent structure model for genomic data integration reveals few pan-genomic cluster subtypes in a breast cancer cohort.

机构信息

Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway.

Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.

出版信息

Bioinformatics. 2019 Dec 1;35(23):4886-4897. doi: 10.1093/bioinformatics/btz381.

DOI:10.1093/bioinformatics/btz381
PMID:31077301
Abstract

MOTIVATION

Unsupervised clustering is important in disease subtyping, among having other genomic applications. As genomic data has become more multifaceted, how to cluster across data sources for more precise subtyping is an ever more important area of research. Many of the methods proposed so far, including iCluster and Cluster of Cluster Assignments (COCAs), make an unreasonable assumption of a common clustering across all data sources, and those that do not are fewer and tend to be computationally intensive.

RESULTS

We propose a Bayesian parametric model for integrative, unsupervised clustering across data sources. In our two-way latent structure model, samples are clustered in relation to each specific data source, distinguishing it from methods like COCAs and iCluster, but cluster labels have across-dataset meaning, allowing cluster information to be shared between data sources. A common scaling across data sources is not required, and inference is obtained by a Gibbs Sampler, which we improve with a warm start strategy and modified density functions to robustify and speed convergence. Posterior interpretation allows for inference on common clusterings occurring among subsets of data sources. An interesting statistical formulation of the model results in sampling from closed-form posteriors despite incorporation of a complex latent structure. We fit the model with Gaussian and more general densities, which influences the degree of across-dataset cluster label sharing. Uniquely among integrative clustering models, our formulation makes no nestedness assumptions of samples across data sources so that a sample missing data from one genomic source can be clustered according to its existing data sources. We apply our model to a Norwegian breast cancer cohort of ductal carcinoma in situ and invasive tumors, comprised of somatic copy-number alteration, methylation and expression datasets. We find enrichment in the Her2 subtype and ductal carcinoma among those observations exhibiting greater cluster correspondence across expression and CNA data. In general, there are few pan-genomic clusterings, suggesting that models assuming a common clustering across genomic data sources might yield misleading results.

AVAILABILITY AND IMPLEMENTATION

The model is implemented in an R package called twl ('two-way latent'), available on CRAN. Data for analysis are available within the R package.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

无监督聚类在疾病亚型分类中很重要,在其他基因组应用中也是如此。随着基因组数据变得更加多样化,如何跨数据源进行聚类以实现更精确的亚型分类是一个越来越重要的研究领域。到目前为止,许多提出的方法,包括 iCluster 和 Cluster of Cluster Assignments(COCAs),都对所有数据源的共同聚类做出了不合理的假设,而不这样做的方法则更少,而且往往计算密集度更高。

结果

我们提出了一种用于跨数据源集成、无监督聚类的贝叶斯参数模型。在我们的双向潜在结构模型中,样本根据每个特定数据源进行聚类,与 COCAs 和 iCluster 等方法区分开来,但聚类标签具有跨数据集的含义,允许在数据源之间共享聚类信息。不需要跨数据源的共同缩放,通过 Gibbs Sampler 进行推断,我们通过预热策略和修改的密度函数来改进 Gibbs Sampler,以增强稳健性和加快收敛速度。后验解释允许对数据源子集之间发生的常见聚类进行推断。模型的有趣统计公式导致即使包含复杂的潜在结构,也可以从闭形式后验中进行采样。我们使用高斯和更一般的密度来拟合模型,这会影响跨数据集聚类标签共享的程度。在集成聚类模型中独一无二的是,我们的公式对数据源之间的样本没有嵌套假设,因此一个从一个基因组源丢失数据的样本可以根据其现有数据源进行聚类。我们将我们的模型应用于挪威乳腺癌队列的原位导管癌和浸润性肿瘤,包括体细胞拷贝数改变、甲基化和表达数据集。我们发现,在那些表现出更大的表达和 CNA 数据对应聚类的观察中,Her2 亚型和导管癌的富集。一般来说,很少有泛基因组聚类,这表明假设跨基因组数据源存在共同聚类的模型可能会产生误导性结果。

可用性和实现

该模型在一个名为 twl(“双向潜在”)的 R 包中实现,可在 CRAN 上获得。分析数据可在 R 包内获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
A Bayesian two-way latent structure model for genomic data integration reveals few pan-genomic cluster subtypes in a breast cancer cohort.贝叶斯双向潜在结构模型用于基因组数据整合,揭示乳腺癌队列中很少有泛基因组聚类亚型。
Bioinformatics. 2019 Dec 1;35(23):4886-4897. doi: 10.1093/bioinformatics/btz381.
2
A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data.一种用于多类型组学数据综合聚类分析的全贝叶斯潜在变量模型。
Biostatistics. 2018 Jan 1;19(1):71-86. doi: 10.1093/biostatistics/kxx017.
3
Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis.基于联合潜在变量模型的多种基因组数据类型综合聚类及其在乳腺癌和肺癌亚型分析中的应用。
Bioinformatics. 2009 Nov 15;25(22):2906-12. doi: 10.1093/bioinformatics/btp543. Epub 2009 Sep 16.
4
Bayesian consensus clustering.贝叶斯共识聚类。
Bioinformatics. 2013 Oct 15;29(20):2610-6. doi: 10.1093/bioinformatics/btt425. Epub 2013 Aug 28.
5
A Bayesian framework for pathway-guided identification of cancer subgroups by integrating multiple types of genomic data.基于贝叶斯框架,通过整合多种类型的基因组数据,对癌症亚组进行通路指导的识别。
Stat Med. 2023 Dec 10;42(28):5266-5284. doi: 10.1002/sim.9911. Epub 2023 Sep 15.
6
Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering.基于生存的贝叶斯聚类,探索更具临床相关性的患者异质性解剖。
Bioinformatics. 2017 Nov 15;33(22):3558-3566. doi: 10.1093/bioinformatics/btx464.
7
Kpax3: Bayesian bi-clustering of large sequence datasets.Kpax3:大型序列数据集的贝叶斯双聚类。
Bioinformatics. 2018 Jun 15;34(12):2132-2133. doi: 10.1093/bioinformatics/bty056.
8
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.聚类组学:针对异构数据集的整合上下文相关聚类
PLoS Comput Biol. 2017 Oct 16;13(10):e1005781. doi: 10.1371/journal.pcbi.1005781. eCollection 2017 Oct.
9
Consensus clustering for Bayesian mixture models.贝叶斯混合模型的一致性聚类。
BMC Bioinformatics. 2022 Jul 21;23(1):290. doi: 10.1186/s12859-022-04830-8.
10
Bayesian structural equation modeling in multiple omics data with application to circadian genes.贝叶斯结构方程模型在多组学数据中的应用及在生物钟基因中的应用。
Bioinformatics. 2020 Jul 1;36(13):3951-3958. doi: 10.1093/bioinformatics/btaa286.

引用本文的文献

1
NetMIM: network-based multi-omics integration with block missingness for biomarker selection and disease outcome prediction.NetMIM:基于网络的多组学整合,具有块缺失,用于生物标志物选择和疾病结果预测。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae454.
2
Molecular classification and biomarkers of clinical outcome in breast ductal carcinoma in situ: Analysis of TBCRC 038 and RAHBT cohorts.乳腺导管原位癌的分子分类和临床结局的生物标志物:TBCRC 038 和 RAHBT 队列的分析。
Cancer Cell. 2022 Dec 12;40(12):1521-1536.e7. doi: 10.1016/j.ccell.2022.10.021. Epub 2022 Nov 17.
3
BCL2A1 and CCL18 Are Predictive Biomarkers of Cisplatin Chemotherapy and Immunotherapy in Colon Cancer Patients.
BCL2A1和CCL18是结肠癌患者顺铂化疗和免疫治疗的预测生物标志物。
Front Cell Dev Biol. 2022 Feb 21;9:799278. doi: 10.3389/fcell.2021.799278. eCollection 2021.
4
MONET: Multi-omic module discovery by omic selection.MONET:通过组学选择进行多组学模块发现。
PLoS Comput Biol. 2020 Sep 15;16(9):e1008182. doi: 10.1371/journal.pcbi.1008182. eCollection 2020 Sep.
5
Vertical integration methods for gene expression data analysis.基因表达数据分析的垂直整合方法。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa169.
6
Contrasting DCIS and invasive breast cancer by subtype suggests basal-like DCIS as distinct lesions.通过亚型对比导管原位癌(DCIS)和浸润性乳腺癌表明,基底样DCIS是不同的病变。
NPJ Breast Cancer. 2020 Jun 17;6:26. doi: 10.1038/s41523-020-0167-x. eCollection 2020.
7
Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study.通过稀疏典型相关分析整合多组学数据以预测复杂性状:一项比较研究。
Bioinformatics. 2020 Nov 1;36(17):4616-4625. doi: 10.1093/bioinformatics/btaa530.