Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, United States.
Faculty of Sciences, Holon Institute of Technology, Holon 58109, Israel.
Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btad751.
In the mixed-membership unsupervised clustering analyses commonly used in population genetics, multiple replicate data analyses can differ in their clustering solutions. Combinatorial algorithms assist in aligning clustering outputs from multiple replicates so that clustering solutions can be interpreted and combined across replicates. Although several algorithms have been introduced, challenges exist in achieving optimal alignments and performing alignments in reasonable computation time.
We present Clumppling, a method for aligning replicate solutions in mixed-membership unsupervised clustering. The method uses integer linear programming for finding optimal alignments, embedding the cluster alignment problem in standard combinatorial optimization frameworks. In example analyses, we find that it achieves solutions with preferred values of a desired objective function relative to those achieved by Pong and that it proceeds with less computation time than Clumpak. It is also the first method to permit alignments across replicates with multiple arbitrary values of the number of clusters K.
Clumppling is available at https://github.com/PopGenClustering/Clumppling.
在群体遗传学中常用的混合成员无监督聚类分析中,多个重复数据分析的聚类结果可能存在差异。组合算法有助于对齐来自多个重复的数据的聚类输出,以便可以跨重复解释和组合聚类结果。尽管已经引入了几种算法,但在实现最优对齐和在合理的计算时间内执行对齐方面仍然存在挑战。
我们提出了 Clumppling,这是一种用于对齐混合成员无监督聚类中重复解决方案的方法。该方法使用整数线性规划来寻找最优对齐,将聚类对齐问题嵌入到标准组合优化框架中。在示例分析中,我们发现它相对于 Pong 获得了更优的目标函数值的解决方案,并且比 Clumpak 所需的计算时间更少。它也是第一个允许跨多个任意聚类数 K 值的重复进行对齐的方法。
Clumppling 可在 https://github.com/PopGenClustering/Clumppling 上获得。