• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

整合多维数据用于聚类分析及其在癌症患者数据中的应用。

Integrating multidimensional data for clustering analysis with applications to cancer patient data.

作者信息

Park Seyoung, Xu Hao, Zhao Hongyu

机构信息

Department of Statistics, Sungkyunkwan University, Seoul, Korea.

Department of Biostatistics, Yale School of Public Health, New Haven, CT.

出版信息

J Am Stat Assoc. 2021;116(533):14-26. doi: 10.1080/01621459.2020.1730853. Epub 2020 Mar 19.

DOI:10.1080/01621459.2020.1730853
PMID:36339813
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9634961/
Abstract

Advances in high-throughput genomic technologies coupled with large-scale studies including The Cancer Genome Atlas (TCGA) project have generated rich resources of diverse types of omics data to better understand cancer etiology and treatment responses. Clustering patients into subtypes with similar disease etiologies and/or treatment responses using multiple omics data types has the potential to improve the precision of clustering than using a single data type. However, in practice, patient clustering is still mostly based on a single type of omics data or ad hoc integration of clustering results from individual data types, leading to potential loss of information. By treating each omics data type as a different informative representation from patients, we propose a novel multi-view spectral clustering framework to integrate different omics data types measured from the same subject. We learn the weight of each data type as well as a similarity measure between patients via a non-convex optimization framework. We solve the proposed non-convex problem iteratively using the ADMM algorithm and show the convergence of the algorithm. The accuracy and robustness of the proposed clustering method is studied both in theory and through various synthetic data. When our method is applied to the TCGA data, the patient clusters inferred by our method show more significant differences in survival times between clusters than those inferred from existing clustering methods.

摘要

高通量基因组技术的进步,再加上包括癌症基因组图谱(TCGA)项目在内的大规模研究,已经产生了丰富的各种组学数据资源,以更好地理解癌症病因和治疗反应。与使用单一数据类型相比,使用多种组学数据类型将患者聚类为具有相似疾病病因和/或治疗反应的亚型,有可能提高聚类的精度。然而,在实践中,患者聚类仍然主要基于单一类型的组学数据或对来自单个数据类型的聚类结果进行临时整合,从而导致潜在的信息丢失。通过将每种组学数据类型视为来自患者的不同信息表示,我们提出了一种新颖的多视图谱聚类框架,以整合从同一受试者测量的不同组学数据类型。我们通过一个非凸优化框架学习每种数据类型的权重以及患者之间的相似性度量。我们使用交替方向乘子法(ADMM)算法迭代求解所提出的非凸问题,并证明了该算法的收敛性。我们从理论和各种合成数据方面研究了所提出的聚类方法的准确性和鲁棒性。当我们的方法应用于TCGA数据时,我们的方法推断出的患者聚类在聚类之间的生存时间上显示出比现有聚类方法推断出的聚类更显著的差异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/95e64ed088e3/nihms-1607826-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/3078457ed8e0/nihms-1607826-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/2f6dd859f92e/nihms-1607826-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/fc93eca12a16/nihms-1607826-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/0d18f1c8efa3/nihms-1607826-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/db96fcc471c2/nihms-1607826-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/d324602a3321/nihms-1607826-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/95e64ed088e3/nihms-1607826-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/3078457ed8e0/nihms-1607826-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/2f6dd859f92e/nihms-1607826-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/fc93eca12a16/nihms-1607826-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/0d18f1c8efa3/nihms-1607826-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/db96fcc471c2/nihms-1607826-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/d324602a3321/nihms-1607826-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd53/9634961/95e64ed088e3/nihms-1607826-f0007.jpg

相似文献

1
Integrating multidimensional data for clustering analysis with applications to cancer patient data.整合多维数据用于聚类分析及其在癌症患者数据中的应用。
J Am Stat Assoc. 2021;116(533):14-26. doi: 10.1080/01621459.2020.1730853. Epub 2020 Mar 19.
2
Multi-View Spectral Clustering Based on Multi-Smooth Representation Fusion for Cancer Subtype Prediction.基于多平滑表示融合的多视图谱聚类用于癌症亚型预测
Front Genet. 2021 Sep 6;12:718915. doi: 10.3389/fgene.2021.718915. eCollection 2021.
3
Multi-view spectral clustering with latent representation learning for applications on multi-omics cancer subtyping.基于潜在表示学习的多视图谱聚类在多组学癌症亚型分析中的应用
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac500.
4
Autoencoder-assisted latent representation learning for survival prediction and multi-view clustering on multi-omics cancer subtyping.基于自动编码器辅助的生存预测潜在表示学习和多组学生物标志物癌症亚型的多视图聚类。
Math Biosci Eng. 2023 Nov 27;20(12):21098-21119. doi: 10.3934/mbe.2023933.
5
Multiview clustering of multi-omics data integration by using a penalty model.基于惩罚模型的多组学数据整合的多角度聚类分析。
BMC Bioinformatics. 2022 Jul 21;23(1):288. doi: 10.1186/s12859-022-04826-4.
6
Convex Multi-View Clustering Via Robust Low Rank Approximation With Application to Multi-Omic Data.通过稳健低秩逼近的凸多视图聚类及其在多组学数据中的应用
IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3340-3352. doi: 10.1109/TCBB.2021.3122961. Epub 2022 Dec 8.
7
Subtype identification from heterogeneous TCGA datasets on a genomic scale by multi-view clustering with enhanced consensus.通过具有增强一致性的多视图聚类,从基因组规模的异质TCGA数据集中进行亚型识别。
BMC Med Genomics. 2017 Dec 21;10(Suppl 4):75. doi: 10.1186/s12920-017-0306-x.
8
Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization.使用序贯双重正则化对多组学数据进行整合聚类以发现疾病亚型
Biostatistics. 2017 Jan;18(1):165-179. doi: 10.1093/biostatistics/kxw039. Epub 2016 Aug 22.
9
Capturing the latent space of an Autoencoder for multi-omics integration and cancer subtyping.捕获自动编码器的潜在空间,用于多组学整合和癌症亚型分类。
Comput Biol Med. 2022 Sep;148:105832. doi: 10.1016/j.compbiomed.2022.105832. Epub 2022 Jul 5.
10
Meta-Analytic Gene-Clustering Algorithm for Integrating Multi-Omics and Multi-Study Data.用于整合多组学和多研究数据的元分析基因聚类算法
Bioengineering (Basel). 2024 Jun 8;11(6):587. doi: 10.3390/bioengineering11060587.

引用本文的文献

1
Uncovering the Understanding of the Concept of Patient Similarity in Cancer Research and Treatment: Scoping Review.揭示癌症研究与治疗中患者相似性概念的理解:范围综述
J Med Internet Res. 2025 Aug 18;27:e71906. doi: 10.2196/71906.
2
A clustering approach to integrative analyses of multiomic cancer data.一种用于多组学癌症数据综合分析的聚类方法。
J Appl Stat. 2024 Nov 29;52(8):1539-1560. doi: 10.1080/02664763.2024.2431742. eCollection 2025.
3
Nextcast: A software suite to analyse and model toxicogenomics data.Nextcast:一个用于分析和建模毒理基因组学数据的软件套件。

本文引用的文献

1
Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma.胰腺导管腺癌的综合基因组特征分析
Cancer Cell. 2017 Aug 14;32(2):185-203.e13. doi: 10.1016/j.ccell.2017.07.007.
2
Entropy-based consensus clustering for patient stratification.基于熵的共识聚类用于患者分层。
Bioinformatics. 2017 Sep 1;33(17):2691-2698. doi: 10.1093/bioinformatics/btx167.
3
Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning.基于核函数相似性学习的单细胞 RNA-seq 数据可视化与分析。
Comput Struct Biotechnol J. 2022 Mar 18;20:1413-1426. doi: 10.1016/j.csbj.2022.03.014. eCollection 2022.
4
Unifying Diagnosis Identification and Prediction Method Embedding the Disease Ontology Structure From Electronic Medical Records.基于电子病历的疾病本体结构的诊断识别与预测方法的统一。
Front Public Health. 2022 Jan 20;9:793801. doi: 10.3389/fpubh.2021.793801. eCollection 2021.
Nat Methods. 2017 Apr;14(4):414-416. doi: 10.1038/nmeth.4207. Epub 2017 Mar 6.
4
Expression and methylation patterns partition luminal-A breast tumors into distinct prognostic subgroups.表达和甲基化模式将腔面A型乳腺肿瘤分为不同的预后亚组。
Breast Cancer Res. 2016 Jul 7;18(1):74. doi: 10.1186/s13058-016-0724-2.
5
Convex Sparse Spectral Clustering: Single-View to Multi-View.凸稀疏谱聚类:从单视图到多视图
IEEE Trans Image Process. 2016 Jun;25(6):2833-2843. doi: 10.1109/TIP.2016.2553459. Epub 2016 Apr 12.
6
Genomic analyses identify molecular subtypes of pancreatic cancer.基因组分析确定了胰腺癌的分子亚型。
Nature. 2016 Mar 3;531(7592):47-52. doi: 10.1038/nature16965. Epub 2016 Feb 24.
7
Molecular targets for the treatment of pancreatic cancer: Clinical and experimental studies.胰腺癌治疗的分子靶点:临床与实验研究
World J Gastroenterol. 2016 Jan 14;22(2):776-89. doi: 10.3748/wjg.v22.i2.776.
8
The consensus molecular subtypes of colorectal cancer.结直肠癌的共识分子亚型
Nat Med. 2015 Nov;21(11):1350-6. doi: 10.1038/nm.3967. Epub 2015 Oct 12.
9
MVDA: a multi-view genomic data integration methodology.MVDA:一种多视图基因组数据整合方法
BMC Bioinformatics. 2015 Aug 19;16:261. doi: 10.1186/s12859-015-0680-3.
10
Erlotinib is effective in pancreatic cancer with epidermal growth factor receptor mutations: a randomized, open-label, prospective trial.厄洛替尼对具有表皮生长因子受体突变的胰腺癌有效:一项随机、开放标签、前瞻性试验。
Oncotarget. 2015 Jul 20;6(20):18162-73. doi: 10.18632/oncotarget.4216.