• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

推广沃德法以用于曼哈顿距离。

Generalising Ward's Method for Use with Manhattan Distances.

作者信息

Strauss Trudie, von Maltitz Michael Johan

机构信息

Department of Mathematical Statistics and Actuarial Science, University of the Free State, Bloemfontein, South Africa.

出版信息

PLoS One. 2017 Jan 13;12(1):e0168288. doi: 10.1371/journal.pone.0168288. eCollection 2017.

DOI:10.1371/journal.pone.0168288
PMID:28085891
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5235383/
Abstract

The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.

摘要

本文研究了关于层次聚类中的沃德链接算法仅限于与欧几里得距离一起使用的说法。在本文中,沃德聚类算法被推广到可与l1范数或曼哈顿距离一起使用。我们认为,将沃德链接方法推广以纳入曼哈顿距离在理论上是合理的,并给出了一个该方法优于使用欧几里得距离的方法的例子。作为一个应用,我们使用通常应用于生物学和基因分类的方法对语言进行统计分析。我们旨在量化不同语言之间字符特征的差异,并使用基于相对双字母组(两个字母的序列)频率的统计语言特征来计算32种印欧语系语言之间的距离矩阵。然后,我们使用沃德层次聚类方法对这些语言进行分类,分别使用欧几里得距离和曼哈顿距离。比较使用不同距离度量获得的结果,以表明在使用曼哈顿度量时,沃德算法最小化簇内变异并最大化簇间变异的特性并未被违反。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cfc/5235383/58e2e6eed086/pone.0168288.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cfc/5235383/bb586e9738bb/pone.0168288.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cfc/5235383/5dea5a03df36/pone.0168288.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cfc/5235383/58e2e6eed086/pone.0168288.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cfc/5235383/bb586e9738bb/pone.0168288.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cfc/5235383/5dea5a03df36/pone.0168288.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cfc/5235383/58e2e6eed086/pone.0168288.g003.jpg

相似文献

1
Generalising Ward's Method for Use with Manhattan Distances.推广沃德法以用于曼哈顿距离。
PLoS One. 2017 Jan 13;12(1):e0168288. doi: 10.1371/journal.pone.0168288. eCollection 2017.
2
Clustering files of chemical structures using the Székely-Rizzo generalization of Ward's method.使用Ward方法的Székely-Rizzo推广对化学结构文件进行聚类。
J Mol Graph Model. 2009 Sep;28(2):187-95. doi: 10.1016/j.jmgm.2009.06.006. Epub 2009 Jul 4.
3
Origins of Indo-Europeans and the spread of agriculture in Europe: comparison of lexicostatistical and genetic evidence.印欧人的起源与欧洲农业的传播:词汇统计证据与基因证据的比较
Hum Biol. 1995 Aug;67(4):577-94.
4
Assessment of different genetic distances in constructing cotton core subset by genotypic values.通过基因型值构建棉花核心种质库时不同遗传距离的评估
J Zhejiang Univ Sci B. 2008 May;9(5):356-62. doi: 10.1631/jzus.B0710615.
5
Using hybridization networks to retrace the evolution of Indo-European languages.利用杂交网络追溯印欧语系语言的演变。
BMC Evol Biol. 2016 Sep 6;16(1):180. doi: 10.1186/s12862-016-0745-6.
6
Mixture Model Tests Of Hierarchical Clustering Algorithms: The Problem Of Classifying Everybody.层次聚类算法的混合模型检验:对每个人进行分类的问题。
Multivariate Behav Res. 1979 Jul 1;14(3):367-84. doi: 10.1207/s15327906mbr1403_6.
7
Aerosol time-of-flight mass spectrometry data analysis: a benchmark of clustering algorithms.气溶胶飞行时间质谱数据分析:聚类算法的基准测试
Anal Chim Acta. 2007 Feb 28;585(1):38-54. doi: 10.1016/j.aca.2006.12.009. Epub 2006 Dec 10.
8
Methods for detecting functional classifications in neuroimaging data.检测神经影像数据中功能分类的方法。
Hum Brain Mapp. 2004 Oct;23(2):109-19. doi: 10.1002/hbm.20050.
9
Metric for measuring the effectiveness of clustering of DNA microarray expression.用于测量 DNA 微阵列表达聚类有效性的度量。
BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-7-S2-S5.
10
Recommendations for validating hierarchical clustering in consumer sensory projects.消费者感官项目中层次聚类验证的建议。
Curr Res Food Sci. 2023 May 19;6:100522. doi: 10.1016/j.crfs.2023.100522. eCollection 2023.

引用本文的文献

1
Credence Signals in Beef Consumption: The Strategic Role of the "100% Autochthonous Breed" Label in Spain.牛肉消费中的信任信号:“100%本地品种”标签在西班牙的战略作用。
Foods. 2025 Jul 8;14(14):2411. doi: 10.3390/foods14142411.
2
Screening for cold tolerance resources in maize seedlings and analysis of leaf cell responses.玉米幼苗耐寒资源筛选及叶片细胞响应分析
Front Plant Sci. 2025 Jun 25;16:1565831. doi: 10.3389/fpls.2025.1565831. eCollection 2025.
3
Classification of medical imaging technologies: results from Türkiye.医学成像技术分类:来自土耳其的结果。

本文引用的文献

1
Risk assessment of particle dispersion and trace element contamination from mine-waste dumps.矿山废弃物堆中颗粒物扩散及微量元素污染的风险评估
Environ Geochem Health. 2015 Apr;37(2):273-86. doi: 10.1007/s10653-014-9645-0. Epub 2014 Sep 5.
2
phangorn: phylogenetic analysis in R.phangorn:R 中的系统发育分析。
Bioinformatics. 2011 Feb 15;27(4):592-3. doi: 10.1093/bioinformatics/btq706. Epub 2010 Dec 17.
3
Phenotype profiling of single gene deletion mutants of E. coli using Biolog technology.利用Biolog技术对大肠杆菌单基因缺失突变体进行表型分析。
BMC Health Serv Res. 2025 Jul 1;25(1):847. doi: 10.1186/s12913-025-12997-y.
4
Emergent climate protection strategies in German hospitals: A cluster analysis.德国医院的紧急气候保护策略:一项聚类分析。
PLoS One. 2025 May 16;20(5):e0312661. doi: 10.1371/journal.pone.0312661. eCollection 2025.
5
Future flooding tolerant rice germplasm: Resilience afforded beyond Sub1A gene.未来耐洪水稻种质:超越Sub1A基因所提供的抗性
Plant Genome. 2025 Jun;18(2):e70040. doi: 10.1002/tpg2.70040.
6
Clustering affordable care act qualified health plans to understand how and where insurance facilitates or impedes access to HIV prevention.将平价医疗法案合格健康计划进行聚类,以了解保险如何以及在何处促进或阻碍艾滋病毒预防。
AIDS Res Ther. 2024 Nov 19;21(1):83. doi: 10.1186/s12981-024-00674-9.
7
Photosynthetic efficiency and water retention in okra (Abelmoschus esculentus) contribute to tolerance to single and combined effects of drought and heat stress.在秋葵(Abelmoschus esculentus)中,光合作用效率和保水能力有助于耐受干旱和热胁迫的单一和综合影响。
Sci Rep. 2024 Nov 15;14(1):28090. doi: 10.1038/s41598-024-79178-5.
8
New Biomarkers for Renal Transporter-Mediated Drug-Drug Interactions: Metabolomic Effects of Cimetidine, Probenecid, Verapamil, and Rifampin in Humans.肾转运体介导的药物相互作用的新生物标志物:西咪替丁、丙磺舒、维拉帕米和利福平对人体的代谢组学影响
Clin Pharmacol Ther. 2025 Jan;117(1):130-142. doi: 10.1002/cpt.3414. Epub 2024 Aug 15.
9
Unsupervised Machine Learning Reveals a Vulvodynia-Predominant Subtype in Bladder Pain Syndrome/Interstitial Cystitis.无监督机器学习揭示了膀胱疼痛综合征/间质性膀胱炎中以外阴痛为主的亚型。
Cureus. 2024 Jun 18;16(6):e62585. doi: 10.7759/cureus.62585. eCollection 2024 Jun.
10
Single-cell somatic copy number alteration profiling of vitreous humor seeds in retinoblastoma.单细胞体细胞核型分析在视网膜母细胞瘤玻璃体液种子中的应用。
Ophthalmic Genet. 2024 Dec;45(6):646-649. doi: 10.1080/13816810.2024.2374886. Epub 2024 Jul 17.
Genome Inform. 2008;21:42-52.
4
Computational cluster validation in post-genomic data analysis.后基因组数据分析中的计算聚类验证
Bioinformatics. 2005 Aug 1;21(15):3201-12. doi: 10.1093/bioinformatics/bti517. Epub 2005 May 24.
5
Language trees and zipping.语言树与压缩
Phys Rev Lett. 2002 Jan 28;88(4):048702. doi: 10.1103/PhysRevLett.88.048702. Epub 2002 Jan 8.
6
Mathematical approaches to comparative linguistics.比较语言学的数学方法。
Proc Natl Acad Sci U S A. 1997 Jun 24;94(13):6585-90. doi: 10.1073/pnas.94.13.6585.
7
Construction of phylogenetic trees.系统发育树的构建。
Science. 1967 Jan 20;155(3760):279-84. doi: 10.1126/science.155.3760.279.
8
Cluster analysis in diagnosis.诊断中的聚类分析。
Clin Chem. 1992 Feb;38(2):182-98.