Suppr超能文献

推广沃德法以用于曼哈顿距离。

Generalising Ward's Method for Use with Manhattan Distances.

作者信息

Strauss Trudie, von Maltitz Michael Johan

机构信息

Department of Mathematical Statistics and Actuarial Science, University of the Free State, Bloemfontein, South Africa.

出版信息

PLoS One. 2017 Jan 13;12(1):e0168288. doi: 10.1371/journal.pone.0168288. eCollection 2017.

Abstract

The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.

摘要

本文研究了关于层次聚类中的沃德链接算法仅限于与欧几里得距离一起使用的说法。在本文中,沃德聚类算法被推广到可与l1范数或曼哈顿距离一起使用。我们认为,将沃德链接方法推广以纳入曼哈顿距离在理论上是合理的,并给出了一个该方法优于使用欧几里得距离的方法的例子。作为一个应用,我们使用通常应用于生物学和基因分类的方法对语言进行统计分析。我们旨在量化不同语言之间字符特征的差异,并使用基于相对双字母组(两个字母的序列)频率的统计语言特征来计算32种印欧语系语言之间的距离矩阵。然后,我们使用沃德层次聚类方法对这些语言进行分类,分别使用欧几里得距离和曼哈顿距离。比较使用不同距离度量获得的结果,以表明在使用曼哈顿度量时,沃德算法最小化簇内变异并最大化簇间变异的特性并未被违反。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cfc/5235383/bb586e9738bb/pone.0168288.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验