• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

混合多视图数据的集成广义凸聚类优化与特征选择

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data.

作者信息

Wang Minjie, Allen Genevera I

机构信息

Department of Statistics, Rice University, Houston, TX 77005, USA.

Departments of Electrical and Computer Engineering, Statistics, and Computer Science, Rice University, Houston, TX 77005, USA; Jan and Dan Duncan Neurological Research Institute, Baylor College of Medicine, Houston, TX 77030, USA.

出版信息

J Mach Learn Res. 2021 Jan;22.

PMID:34744522
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8570363/
Abstract

In mixed multi-view data, multiple sets of diverse features are measured on the same set of samples. By integrating all available data sources, we seek to discover common group structure among the samples that may be hidden in individualistic cluster analyses of a single data view. While several techniques for such integrative clustering have been explored, we propose and develop a convex formalization that enjoys strong empirical performance and inherits the mathematical properties of increasingly popular convex clustering methods. Specifically, our Integrative Generalized Convex Clustering Optimization (iGecco) method employs different convex distances, losses, or divergences for each of the different data views with a joint convex fusion penalty that leads to common groups. Additionally, integrating mixed multi-view data is often challenging when each data source is high-dimensional. To perform feature selection in such scenarios, we develop an adaptive shifted group-lasso penalty that selects features by shrinking them towards their loss-specific centers. Our so-called iGecco+ approach selects features from each data view that are best for determining the groups, often leading to improved integrative clustering. To solve our problem, we develop a new type of generalized multi-block ADMM algorithm using sub-problem approximations that more efficiently fits our model for big data sets. Through a series of numerical experiments and real data examples on text mining and genomics, we show that iGecco+ achieves superior empirical performance for high-dimensional mixed multi-view data.

摘要

在混合多视图数据中,在同一组样本上测量了多组不同的特征。通过整合所有可用数据源,我们试图发现样本之间可能隐藏在单个数据视图的个性化聚类分析中的共同组结构。虽然已经探索了几种用于这种整合聚类的技术,但我们提出并开发了一种凸形式化方法,它具有强大的实证性能,并继承了越来越流行的凸聚类方法的数学性质。具体来说,我们的整合广义凸聚类优化(iGecco)方法为每个不同的数据视图采用不同的凸距离、损失或散度,并带有一个联合凸融合惩罚项,从而得出共同的组。此外,当每个数据源都是高维时,整合混合多视图数据通常具有挑战性。为了在这种情况下进行特征选择,我们开发了一种自适应移位组套索惩罚项,通过将特征向其特定于损失的中心收缩来选择特征。我们所谓的iGecco +方法从每个数据视图中选择最适合确定组的特征,这通常会导致改进的整合聚类。为了解决我们的问题,我们使用子问题近似开发了一种新型的广义多块交替方向乘子法(ADMM)算法,该算法能更有效地使我们的模型适用于大数据集。通过一系列关于文本挖掘和基因组学的数值实验和实际数据示例,我们表明iGecco +在高维混合多视图数据上实现了卓越的实证性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/780c3a4fc5af/nihms-1715302-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/b64878e5a4d9/nihms-1715302-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/e8de4168b629/nihms-1715302-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/a7152f95ccb9/nihms-1715302-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/a9a06dc8badd/nihms-1715302-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/d9e99007e7c5/nihms-1715302-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/780c3a4fc5af/nihms-1715302-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/b64878e5a4d9/nihms-1715302-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/e8de4168b629/nihms-1715302-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/a7152f95ccb9/nihms-1715302-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/a9a06dc8badd/nihms-1715302-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/d9e99007e7c5/nihms-1715302-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b2d/8570363/780c3a4fc5af/nihms-1715302-f0004.jpg

相似文献

1
Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data.混合多视图数据的集成广义凸聚类优化与特征选择
J Mach Learn Res. 2021 Jan;22.
2
Supervised convex clustering.有监督凸聚类。
Biometrics. 2023 Dec;79(4):3846-3858. doi: 10.1111/biom.13860. Epub 2023 Apr 12.
3
Convex Multi-View Clustering Via Robust Low Rank Approximation With Application to Multi-Omic Data.通过稳健低秩逼近的凸多视图聚类及其在多组学数据中的应用
IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3340-3352. doi: 10.1109/TCBB.2021.3122961. Epub 2022 Dec 8.
4
Dual self-paced multi-view clustering.双自步多视图聚类。
Neural Netw. 2021 Aug;140:184-192. doi: 10.1016/j.neunet.2021.02.022. Epub 2021 Mar 6.
5
Sparse generalized linear model with approximation for feature selection and prediction with big omics data.用于大组学数据特征选择和预测的具有近似值的稀疏广义线性模型。
BioData Min. 2017 Dec 19;10:39. doi: 10.1186/s13040-017-0159-z. eCollection 2017.
6
High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources.考虑多个数据源回归系数异质性的高维变量选择
Can J Stat. 2024 Sep;52(3):900-923. doi: 10.1002/cjs.11793. Epub 2023 Aug 19.
7
Logarithmic Schatten- p Norm Minimization for Tensorial Multi-View Subspace Clustering.张量多视图子空间聚类的对数 Schatten- p 范数最小化。
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3396-3410. doi: 10.1109/TPAMI.2022.3179556. Epub 2023 Feb 3.
8
Information-incorporated sparse convex clustering for disease subtyping.基于信息融合的稀疏凸聚类疾病亚分类方法。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad417.
9
Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization.使用序贯双重正则化对多组学数据进行整合聚类以发现疾病亚型
Biostatistics. 2017 Jan;18(1):165-179. doi: 10.1093/biostatistics/kxw039. Epub 2016 Aug 22.
10
Joint Learning of Latent Similarity and Local Embedding for Multi-View Clustering.用于多视图聚类的潜在相似性和局部嵌入联合学习
IEEE Trans Image Process. 2021;30:6772-6784. doi: 10.1109/TIP.2021.3096086. Epub 2021 Jul 30.

引用本文的文献

1
From bites to bytes: understanding how and why individual malaria risk varies using artificial intelligence and causal inference.从叮咬到字节:利用人工智能和因果推断理解个体疟疾风险的变化方式及原因。
Front Genet. 2025 May 16;16:1599826. doi: 10.3389/fgene.2025.1599826. eCollection 2025.
2
Simple and Scalable Algorithms for Cluster-Aware Precision Medicine.用于集群感知精准医疗的简单且可扩展算法
Proc Mach Learn Res. 2024 May;238:136-144.
3
Multi-view data visualisation manifold learning.多视图数据可视化 流形学习

本文引用的文献

1
Optimal Sparse Linear Prediction for Block-missing Multi-modality Data without Imputation.无插补的块缺失多模态数据的最优稀疏线性预测
J Am Stat Assoc. 2020;115(531):1406-1419. doi: 10.1080/01621459.2019.1632079. Epub 2019 Jul 22.
2
Fixing and extending some recent results on the ADMM algorithm.修正并扩展关于交替方向乘子法(ADMM)算法的一些近期结果。
Numer Algorithms. 2021;86(3):1303-1325. doi: 10.1007/s11075-020-00934-5. Epub 2020 May 14.
3
Provable Convex Co-clustering of Tensors.张量的可证凸共聚类
PeerJ Comput Sci. 2024 May 24;10:e1993. doi: 10.7717/peerj-cs.1993. eCollection 2024.
4
Design of health information management model for elderly care using an advanced higher-order hybrid clustering algorithm from the perspective of sports and medicine integration.从体医融合角度出发设计采用高级高阶混合聚类算法的老年护理健康信息管理模型。
PLoS One. 2024 May 17;19(5):e0302741. doi: 10.1371/journal.pone.0302741. eCollection 2024.
5
Information-incorporated sparse convex clustering for disease subtyping.基于信息融合的稀疏凸聚类疾病亚分类方法。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad417.
6
Fast and interpretable consensus clustering via minipatch learning.通过微块学习实现快速且可解释的共识聚类。
PLoS Comput Biol. 2022 Oct 3;18(10):e1010577. doi: 10.1371/journal.pcbi.1010577. eCollection 2022 Oct.
7
Two-stage linked component analysis for joint decomposition of multiple biologically related data sets.两阶段关联成分分析用于联合分解多个具有生物学相关性的数据集。
Biostatistics. 2022 Oct 14;23(4):1200-1217. doi: 10.1093/biostatistics/kxac005.
J Mach Learn Res. 2020;21.
4
Clustering with t-SNE, provably.使用t-SNE进行聚类,可证明。
SIAM J Math Data Sci. 2019;1(2):313-332. doi: 10.1137/18m1216134. Epub 2019 May 28.
5
Dynamic Visualization and Fast Computation for Convex Clustering via Algorithmic Regularization.通过算法正则化实现凸聚类的动态可视化与快速计算
J Comput Graph Stat. 2020;29(1):87-96. doi: 10.1080/10618600.2019.1629943. Epub 2019 Jul 19.
6
A New Algorithm and Theory for Penalized Regression-based Clustering.一种基于惩罚回归聚类的新算法与理论
J Mach Learn Res. 2016;17.
7
miR-190 suppresses breast cancer metastasis by regulation of TGF-β-induced epithelial-mesenchymal transition.miR-190 通过调节 TGF-β 诱导的上皮间质转化抑制乳腺癌转移。
Mol Cancer. 2018 Mar 6;17(1):70. doi: 10.1186/s12943-018-0818-9.
8
Clustering of samples and variables with mixed-type data.对具有混合型数据的样本和变量进行聚类。
PLoS One. 2017 Nov 28;12(11):e0188274. doi: 10.1371/journal.pone.0188274. eCollection 2017.
9
A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data.一种用于多类型组学数据综合聚类分析的全贝叶斯潜在变量模型。
Biostatistics. 2018 Jan 1;19(1):71-86. doi: 10.1093/biostatistics/kxx017.
10
Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm.基于非负矩阵分解算法的多组学数据的整合聚类
PLoS One. 2017 May 1;12(5):e0176278. doi: 10.1371/journal.pone.0176278. eCollection 2017.