Clumppling：使用整数线性规划进行聚类匹配和排列程序。

Clumppling: cluster matching and permutation program with integer linear programming.

机构信息

Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, United States.

Faculty of Sciences, Holon Institute of Technology, Holon 58109, Israel.

出版信息

Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btad751.

DOI:10.1093/bioinformatics/btad751

PMID:38096585

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10766593/

Abstract

MOTIVATION

In the mixed-membership unsupervised clustering analyses commonly used in population genetics, multiple replicate data analyses can differ in their clustering solutions. Combinatorial algorithms assist in aligning clustering outputs from multiple replicates so that clustering solutions can be interpreted and combined across replicates. Although several algorithms have been introduced, challenges exist in achieving optimal alignments and performing alignments in reasonable computation time.

RESULTS

We present Clumppling, a method for aligning replicate solutions in mixed-membership unsupervised clustering. The method uses integer linear programming for finding optimal alignments, embedding the cluster alignment problem in standard combinatorial optimization frameworks. In example analyses, we find that it achieves solutions with preferred values of a desired objective function relative to those achieved by Pong and that it proceeds with less computation time than Clumpak. It is also the first method to permit alignments across replicates with multiple arbitrary values of the number of clusters K.

AVAILABILITY AND IMPLEMENTATION

Clumppling is available at https://github.com/PopGenClustering/Clumppling.

摘要

动机

在群体遗传学中常用的混合成员无监督聚类分析中，多个重复数据分析的聚类结果可能存在差异。组合算法有助于对齐来自多个重复的数据的聚类输出，以便可以跨重复解释和组合聚类结果。尽管已经引入了几种算法，但在实现最优对齐和在合理的计算时间内执行对齐方面仍然存在挑战。

结果

我们提出了 Clumppling，这是一种用于对齐混合成员无监督聚类中重复解决方案的方法。该方法使用整数线性规划来寻找最优对齐，将聚类对齐问题嵌入到标准组合优化框架中。在示例分析中，我们发现它相对于 Pong 获得了更优的目标函数值的解决方案，并且比 Clumpak 所需的计算时间更少。它也是第一个允许跨多个任意聚类数 K 值的重复进行对齐的方法。

可用性和实现

Clumppling 可在 https://github.com/PopGenClustering/Clumppling 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c9e/10766593/6f1d98dfd281/btad751f1.jpg

相似文献

Clumppling: cluster matching and permutation program with integer linear programming.

Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btad751.

CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure.

Bioinformatics. 2007 Jul 15;23(14):1801-6. doi: 10.1093/bioinformatics/btm233. Epub 2007 May 7.

Alignment of biological networks by integer linear programming: virus-host protein-protein interaction networks.

BMC Bioinformatics. 2020 Nov 18;21(Suppl 6):434. doi: 10.1186/s12859-020-03733-w.

Clumpak: a program for identifying clustering modes and packaging population structure inferences across K.

Mol Ecol Resour. 2015 Sep;15(5):1179-91. doi: 10.1111/1755-0998.12387. Epub 2015 Feb 27.

pong: fast analysis and visualization of latent clusters in population genetic data.

Bioinformatics. 2016 Sep 15;32(18):2817-23. doi: 10.1093/bioinformatics/btw327. Epub 2016 Jun 9.

A Dirichlet model of alignment cost in mixed-membership unsupervised clustering.

J Comput Graph Stat. 2023;32(3):1145-1159. doi: 10.1080/10618600.2022.2127739. Epub 2022 Nov 14.

Sequential computation of elementary modes and minimal cut sets in genome-scale metabolic networks using alternate integer linear programming.

Bioinformatics. 2017 Aug 1;33(15):2345-2353. doi: 10.1093/bioinformatics/btx171.

Sequoya: multiobjective multiple sequence alignment in Python.

Bioinformatics. 2020 Jun 1;36(12):3892-3893. doi: 10.1093/bioinformatics/btaa257.

MAGUS: Multiple sequence Alignment using Graph clUStering.

Bioinformatics. 2021 Jul 19;37(12):1666-1672. doi: 10.1093/bioinformatics/btaa992.

Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming.

BMC Bioinformatics. 2017 Dec 6;18(1):537. doi: 10.1186/s12859-017-1944-x.

引用本文的文献

Revealing the range of equally likely estimates in the admixture model.

G3 (Bethesda). 2025 Aug 6;15(8). doi: 10.1093/g3journal/jkaf142.

本文引用的文献

A Dirichlet model of alignment cost in mixed-membership unsupervised clustering.

J Comput Graph Stat. 2023;32(3):1145-1159. doi: 10.1080/10618600.2022.2127739. Epub 2022 Nov 14.

Major inconsistencies of inferred population genetic structure estimated in a large set of domestic horse breeds using microsatellites.

Ecol Evol. 2020 Apr 12;10(10):4261-4279. doi: 10.1002/ece3.6195. eCollection 2020 May.

Statistical test for detecting community structure in real-valued edge-weighted graphs.

PLoS One. 2018 Mar 20;13(3):e0194079. doi: 10.1371/journal.pone.0194079. eCollection 2018.

Parallel Trajectories of Genetic and Linguistic Admixture in a Genetically Admixed Creole Population.

Curr Biol. 2017 Aug 21;27(16):2529-2535.e3. doi: 10.1016/j.cub.2017.07.002. Epub 2017 Aug 10.

pong: fast analysis and visualization of latent clusters in population genetic data.

Bioinformatics. 2016 Sep 15;32(18):2817-23. doi: 10.1093/bioinformatics/btw327. Epub 2016 Jun 9.

XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data.

BMC Bioinformatics. 2015;16 Suppl 11(Suppl 11):S5. doi: 10.1186/1471-2105-16-S11-S5. Epub 2015 Aug 13.

Clumpak: a program for identifying clustering modes and packaging population structure inferences across K.

Mol Ecol Resour. 2015 Sep;15(5):1179-91. doi: 10.1111/1755-0998.12387. Epub 2015 Feb 27.

Fast model-based estimation of ancestry in unrelated individuals.

Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

Genetic variation and population structure in native Americans.

PLoS Genet. 2007 Nov;3(11):e185. doi: 10.1371/journal.pgen.0030185.

CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure.

Bioinformatics. 2007 Jul 15;23(14):1801-6. doi: 10.1093/bioinformatics/btm233. Epub 2007 May 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Clumppling：使用整数线性规划进行聚类匹配和排列程序。

Clumppling: cluster matching and permutation program with integer linear programming.

机构信息

Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, United States.

Faculty of Sciences, Holon Institute of Technology, Holon 58109, Israel.