凸双聚类

Convex biclustering.

作者信息

Chi Eric C, Allen Genevera I, Baraniuk Richard G

机构信息

Department of Statistics, North Carolina State University, 2311 Stinson Dr, Raleigh, North Carolina, U.S.A.

Department of Statistics, Rice University, 6100 Main St, Houston, Texas, U.S.A.

出版信息

Biometrics. 2017 Mar;73(1):10-19. doi: 10.1111/biom.12540. Epub 2016 May 10.

DOI:10.1111/biom.12540

PMID:27163413

Abstract

In the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. We present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. Our approach generates an entire solution path of possible biclusters as a single tuning parameter is varied. We also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of our work are its simplicity, interpretability, and algorithmic guarantees-features that arguably are lacking in the current alternative algorithms. We demonstrate the advantages of our approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.

摘要

在双聚类问题中，我们试图同时对观测值和特征进行分组。虽然双聚类在从文本挖掘到协同过滤等广泛领域都有应用，但识别高维基因组数据中的结构这一问题推动了这项工作。在这种背景下，双聚类使我们能够识别仅在实验条件子集中共同表达的基因子集。我们提出了双聚类问题的一种凸形式，它具有唯一的全局极小值，以及一种迭代算法COBRA，该算法保证能识别出这个极小值。随着单个调优参数的变化，我们的方法会生成可能的双聚类的完整解路径。我们还展示了如何将选择这个调优参数的问题简化为求解凸双聚类问题的一个简单修改。我们工作的关键贡献在于其简单性、可解释性和算法保证，而这些特性在当前的替代算法中可能是缺乏的。我们在模拟和真实微阵列数据上展示了我们方法的优势，包括稳定且可重复地识别双聚类。

相似文献

Convex biclustering.

Biometrics. 2017 Mar;73(1):10-19. doi: 10.1111/biom.12540. Epub 2016 May 10.

Finding multiple coherent biclusters in microarray data using variable string length multiobjective genetic algorithm.

IEEE Trans Inf Technol Biomed. 2009 Nov;13(6):969-75. doi: 10.1109/TITB.2009.2017527. Epub 2009 Mar 16.

A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data.

J Theor Biol. 2008 Mar 21;251(2):264-74. doi: 10.1016/j.jtbi.2007.11.030. Epub 2007 Dec 4.

Discovering biclusters in gene expression data based on high-dimensional linear geometries.

BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.

Parallelized evolutionary learning for detection of biclusters in gene expression data.

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):560-70. doi: 10.1109/TCBB.2011.53. Epub 2011 Mar 3.

QUBIC: a qualitative biclustering algorithm for analyses of gene expression data.

Nucleic Acids Res. 2009 Aug;37(15):e101. doi: 10.1093/nar/gkp491. Epub 2009 Jun 9.

Dynamic biclustering of microarray data by multi-objective immune optimization.

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S11. doi: 10.1186/1471-2164-12-S2-S11. Epub 2011 Jul 27.

Biclustering algorithms for biological data analysis: a survey.

IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45. doi: 10.1109/TCBB.2004.2.

A graph spectrum based geometric biclustering algorithm.

J Theor Biol. 2013 Jan 21;317:200-11. doi: 10.1016/j.jtbi.2012.10.012. Epub 2012 Oct 16.

IEEE/ACM Trans Comput Biol Bioinform. 2014 Sep-Oct;11(5):942-54. doi: 10.1109/TCBB.2014.2325016.

引用本文的文献

Optimal variable clustering for high-dimensional matrix valued data.

Inf inference. 2025 Mar 12;14(1):iaaf001. doi: 10.1093/imaiai/iaaf001. eCollection 2025 Mar.

Robust convex biclustering with a tuning-free method.

J Appl Stat. 2024 Jun 17;52(2):271-286. doi: 10.1080/02664763.2024.2367143. eCollection 2025.

Biclustering Multivariate Longitudinal Data with Application to Recovery Trajectories of White Matter After Sport-Related Concussion.

Data Sci Sci. 2024;3(1). doi: 10.1080/26941899.2024.2376535. Epub 2024 Jul 16.

A Hyperparameter-Free, Fast and Efficient Framework to Detect Clusters From Limited Samples Based on Ultra High-Dimensional Features.

IEEE Access. 2022;10:116844-116857. doi: 10.1109/access.2022.3218800. Epub 2022 Nov 1.

Robust integrative biclustering for multi-view data.

Stat Methods Med Res. 2022 Nov;31(11):2201-2216. doi: 10.1177/09622802221122427. Epub 2022 Sep 13.

Discovering Geometry in Data Arrays.

Comput Sci Eng. 2021 Nov-Dec;23(6):42-51. doi: 10.1109/mcse.2021.3120039. Epub 2021 Oct 14.

Multi-scale affinities with missing data: Estimation and applications.

Stat Anal Data Min. 2022 Jun;15(3):303-313. doi: 10.1002/sam.11561. Epub 2021 Nov 5.

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data.

J Mach Learn Res. 2021 Jan;22.

COBRAC: a fast implementation of convex biclustering with compression.

Bioinformatics. 2021 Oct 25;37(20):3667-3669. doi: 10.1093/bioinformatics/btab248.

Simultaneous Parameter Learning and Bi-clustering for Multi-Response Models.

Front Big Data. 2019 Aug 14;2:27. doi: 10.3389/fdata.2019.00027. eCollection 2019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

凸双聚类

Convex biclustering.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献