COBRAC：一种具有压缩功能的凸双聚类快速实现方法。

COBRAC: a fast implementation of convex biclustering with compression.

作者信息

Yi Haidong, Huang Le, Mishne Gal, Chi Eric C

机构信息

Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Department of Genetics, Curriculum in Bioinformatics & Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

出版信息

Bioinformatics. 2021 Oct 25;37(20):3667-3669. doi: 10.1093/bioinformatics/btab248.

DOI:10.1093/bioinformatics/btab248

PMID:33904580

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8545294/

Abstract

SUMMARY

Biclustering is a generalization of clustering used to identify simultaneous grouping patterns in observations (rows) and features (columns) of a data matrix. Recently, the biclustering task has been formulated as a convex optimization problem. While this convex recasting of the problem has attractive properties, existing algorithms do not scale well. To address this problem and make convex biclustering a practical tool for analyzing larger data, we propose an implementation of fast convex biclustering called COBRAC to reduce the computing time by iteratively compressing problem size along with the solution path. We apply COBRAC to several gene expression datasets to demonstrate its effectiveness and efficiency. Besides the standalone version for COBRAC, we also developed a related online web server for online calculation and visualization of the downloadable interactive results.

AVAILABILITY AND IMPLEMENTATION

The source code and test data are available at https://github.com/haidyi/cvxbiclustr or https://zenodo.org/record/4620218. The web server is available at https://cvxbiclustr.ericchi.com.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

双聚类是聚类的一种推广，用于识别数据矩阵的观测值（行）和特征（列）中的同时分组模式。最近，双聚类任务已被表述为一个凸优化问题。虽然该问题的这种凸形式化具有吸引人的特性，但现有算法扩展性不佳。为解决此问题并使凸双聚类成为分析更大数据的实用工具，我们提出一种名为COBRAC的快速凸双聚类实现方法，通过沿求解路径迭代压缩问题规模来减少计算时间。我们将COBRAC应用于多个基因表达数据集，以证明其有效性和效率。除了COBRAC的独立版本，我们还开发了一个相关的在线网络服务器，用于对可下载的交互式结果进行在线计算和可视化。

可用性与实现

源代码和测试数据可在https://github.com/haidyi/cvxbiclustr或https://zenodo.org/record/4620218获取。网络服务器可在https://cvxbiclustr.ericchi.com访问。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

COBRAC: a fast implementation of convex biclustering with compression.

Bioinformatics. 2021 Oct 25;37(20):3667-3669. doi: 10.1093/bioinformatics/btab248.

BiCoN: network-constrained biclustering of patients and omics data.

Bioinformatics. 2021 Aug 25;37(16):2398-2404. doi: 10.1093/bioinformatics/btaa1076.

EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery.

Bioinformatics. 2018 Nov 1;34(21):3719-3726. doi: 10.1093/bioinformatics/bty401.

Comparison of sparse biclustering algorithms for gene expression datasets.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab140.

QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data.

Bioinformatics. 2020 Feb 15;36(4):1143-1149. doi: 10.1093/bioinformatics/btz692.

Convex biclustering.

Biometrics. 2017 Mar;73(1):10-19. doi: 10.1111/biom.12540. Epub 2016 May 10.

RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters.

Bioinformatics. 2020 Dec 22;36(20):5054-5060. doi: 10.1093/bioinformatics/btaa630.

runibic: a Bioconductor package for parallel row-based biclustering of gene expression data.

Bioinformatics. 2018 Dec 15;34(24):4302-4304. doi: 10.1093/bioinformatics/bty512.

Flashfm-ivis: interactive visualization for fine-mapping of multiple quantitative traits.

Bioinformatics. 2022 Sep 2;38(17):4238-4242. doi: 10.1093/bioinformatics/btac453.

A GPU-accelerated algorithm for biclustering analysis and detection of condition-dependent coexpression network modules.

Sci Rep. 2017 Jun 23;7(1):4162. doi: 10.1038/s41598-017-04070-4.

引用本文的文献

Statistical methods and resources for biomarker discovery using metabolomics.

BMC Bioinformatics. 2023 Jun 15;24(1):250. doi: 10.1186/s12859-023-05383-0.

Gene differential co-expression analysis of male infertility patients based on statistical and machine learning methods.

Front Microbiol. 2023 Jan 27;14:1092143. doi: 10.3389/fmicb.2023.1092143. eCollection 2023.

Multi-scale affinities with missing data: Estimation and applications.

Stat Anal Data Min. 2022 Jun;15(3):303-313. doi: 10.1002/sam.11561. Epub 2021 Nov 5.

本文引用的文献

Clustering with t-SNE, provably.

SIAM J Math Data Sci. 2019;1(2):313-332. doi: 10.1137/18m1216134. Epub 2019 May 28.

Dynamic Visualization and Fast Computation for Convex Clustering via Algorithmic Regularization.

J Comput Graph Stat. 2020;29(1):87-96. doi: 10.1080/10618600.2019.1629943. Epub 2019 Jul 19.

Convex biclustering.

Biometrics. 2017 Mar;73(1):10-19. doi: 10.1111/biom.12540. Epub 2016 May 10.

Splitting Methods for Convex Clustering.

J Comput Graph Stat. 2015;24(4):994-1013. doi: 10.1080/10618600.2014.948181. Epub 2015 Dec 10.

Convex clustering: an attractive alternative to hierarchical clustering.

PLoS Comput Biol. 2015 May 12;11(5):e1004228. doi: 10.1371/journal.pcbi.1004228. eCollection 2015 May.

Comprehensive molecular portraits of human breast tumours.

Nature. 2012 Oct 4;490(7418):61-70. doi: 10.1038/nature11412. Epub 2012 Sep 23.

Biclustering via sparse singular value decomposition.

Biometrics. 2010 Dec;66(4):1087-95. doi: 10.1111/j.1541-0420.2010.01392.x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

COBRAC：一种具有压缩功能的凸双聚类快速实现方法。

COBRAC: a fast implementation of convex biclustering with compression.

作者信息

机构信息

出版信息

SUMMARY

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

摘要

可用性与实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献