Yi Haidong, Huang Le, Mishne Gal, Chi Eric C
Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Department of Genetics, Curriculum in Bioinformatics & Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Bioinformatics. 2021 Oct 25;37(20):3667-3669. doi: 10.1093/bioinformatics/btab248.
Biclustering is a generalization of clustering used to identify simultaneous grouping patterns in observations (rows) and features (columns) of a data matrix. Recently, the biclustering task has been formulated as a convex optimization problem. While this convex recasting of the problem has attractive properties, existing algorithms do not scale well. To address this problem and make convex biclustering a practical tool for analyzing larger data, we propose an implementation of fast convex biclustering called COBRAC to reduce the computing time by iteratively compressing problem size along with the solution path. We apply COBRAC to several gene expression datasets to demonstrate its effectiveness and efficiency. Besides the standalone version for COBRAC, we also developed a related online web server for online calculation and visualization of the downloadable interactive results.
The source code and test data are available at https://github.com/haidyi/cvxbiclustr or https://zenodo.org/record/4620218. The web server is available at https://cvxbiclustr.ericchi.com.
Supplementary data are available at Bioinformatics online.
双聚类是聚类的一种推广,用于识别数据矩阵的观测值(行)和特征(列)中的同时分组模式。最近,双聚类任务已被表述为一个凸优化问题。虽然该问题的这种凸形式化具有吸引人的特性,但现有算法扩展性不佳。为解决此问题并使凸双聚类成为分析更大数据的实用工具,我们提出一种名为COBRAC的快速凸双聚类实现方法,通过沿求解路径迭代压缩问题规模来减少计算时间。我们将COBRAC应用于多个基因表达数据集,以证明其有效性和效率。除了COBRAC的独立版本,我们还开发了一个相关的在线网络服务器,用于对可下载的交互式结果进行在线计算和可视化。
补充数据可在《生物信息学》在线获取。