Oggier Frédérique, Datta Anwitaman
School of Physical & Mathematical Sciences, Nanyang Technological University Singapore, Singapore.
School of Computer Science & Engineering, Nanyang Technological University Singapore, Singapore.
PeerJ Comput Sci. 2023 Apr 20;9:e1339. doi: 10.7717/peerj-cs.1339. eCollection 2023.
This work is motivated by applications of parsimonious cladograms for the purpose of analyzing non-biological data. Parsimonious cladograms were introduced as a means to help understanding the tree of life, and are now used in fields related to biological sciences at large, ., to analyze viruses or to predict the structure of proteins. We revisit parsimonious cladograms through the lens of clustering and compare cladograms optimized for parsimony with dendograms obtained from single linkage hierarchical clustering. We show that despite similarities in both approaches, there exist datasets whose clustering dendogram is incompatible with parsimony optimization. Furthermore, we provide numerical examples to compare F-scores the clustering obtained through both parsimonious cladograms and single linkage hierarchical dendograms.
这项工作的动机是出于使用简约分支图来分析非生物数据的应用。简约分支图最初是作为一种帮助理解生命之树的手段而引入的,现在广泛应用于与生物科学相关的领域,例如,分析病毒或预测蛋白质结构。我们通过聚类的视角重新审视简约分支图,并将针对简约性进行优化的分支图与从单链层次聚类获得的树状图进行比较。我们表明,尽管这两种方法存在相似之处,但存在一些数据集,其聚类树状图与简约性优化不兼容。此外,我们提供了数值示例,以比较通过简约分支图和单链层次树状图获得的聚类的F分数。