Suppr超能文献

凸双聚类

Convex biclustering.

作者信息

Chi Eric C, Allen Genevera I, Baraniuk Richard G

机构信息

Department of Statistics, North Carolina State University, 2311 Stinson Dr, Raleigh, North Carolina, U.S.A.

Department of Statistics, Rice University, 6100 Main St, Houston, Texas, U.S.A.

出版信息

Biometrics. 2017 Mar;73(1):10-19. doi: 10.1111/biom.12540. Epub 2016 May 10.

Abstract

In the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. We present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. Our approach generates an entire solution path of possible biclusters as a single tuning parameter is varied. We also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of our work are its simplicity, interpretability, and algorithmic guarantees-features that arguably are lacking in the current alternative algorithms. We demonstrate the advantages of our approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.

摘要

在双聚类问题中,我们试图同时对观测值和特征进行分组。虽然双聚类在从文本挖掘到协同过滤等广泛领域都有应用,但识别高维基因组数据中的结构这一问题推动了这项工作。在这种背景下,双聚类使我们能够识别仅在实验条件子集中共同表达的基因子集。我们提出了双聚类问题的一种凸形式,它具有唯一的全局极小值,以及一种迭代算法COBRA,该算法保证能识别出这个极小值。随着单个调优参数的变化,我们的方法会生成可能的双聚类的完整解路径。我们还展示了如何将选择这个调优参数的问题简化为求解凸双聚类问题的一个简单修改。我们工作的关键贡献在于其简单性、可解释性和算法保证,而这些特性在当前的替代算法中可能是缺乏的。我们在模拟和真实微阵列数据上展示了我们方法的优势,包括稳定且可重复地识别双聚类。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验