• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

欧几里得、乌鸦、狼与行人:语言类型学的距离度量

Euclide, the crow, the wolf and the pedestrian: distance metrics for linguistic typology.

作者信息

Guzmán Naranjo Matías, Jäger Gerhard

机构信息

Linguistics, Albert-Ludwigs-Universitat Freiburg, Freiburg, Baden-Württemberg, 79085, Germany.

Seminar für Sprachwissenschaft, Eberhard Karls Universitat Tubingen, Tübingen, Baden-Württemberg, 72074, Germany.

出版信息

Open Res Eur. 2024 Jul 2;3:104. doi: 10.12688/openreseurope.16141.2. eCollection 2023.

DOI:10.12688/openreseurope.16141.2
PMID:38989155
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11234076/
Abstract

It is common for people working on linguistic geography, language contact and typology to make use of some type of distance metric between lects. However, most work so far has either used Euclidean distances, or geodesic distance, both of which do not represent the real separation between communities very accurately. This paper presents two datasets: one on walking distances and one on topographic distances between over 8700 lects across all macro-areas. We calculated walking distances using Open Street Maps data, and topographic distances using digital elevation data. We evaluate these distance metrics on three case studies and show that from the four distances, the topographic and geodesic distances showed the most consistent performance across datasets, and would be likely to be reasonable first choices. At the same time, in most cases, the Euclidean distances were not much worse than the other distances, and might be a good enough approximation in cases for which performance is critical, or the dataset cover very large areas, and the point-location information is not very precise.

摘要

从事语言地理学、语言接触和类型学研究的人员通常会使用某种方言间的距离度量。然而,迄今为止,大多数研究要么使用欧几里得距离,要么使用测地距离,这两种距离都不能非常准确地反映不同群体之间的实际分隔。本文展示了两个数据集:一个是关于步行距离的,另一个是关于所有宏观区域内8700多种方言间地形距离的。我们使用开放街道地图数据计算步行距离,使用数字高程数据计算地形距离。我们在三个案例研究中评估了这些距离度量,并表明在这四种距离中,地形距离和测地距离在各数据集中表现最为一致,很可能是合理的首选。同时,在大多数情况下,欧几里得距离并不比其他距离差太多,在对性能要求苛刻的情况下,或者数据集覆盖非常大的区域且点定位信息不是非常精确的情况下,它可能是一个足够好的近似值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/f2c025438d7b/openreseurope-3-18965-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/a0692f17b96a/openreseurope-3-18965-g0000.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/52b74aca24bc/openreseurope-3-18965-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/b2915c60bd97/openreseurope-3-18965-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/74f814b2baa9/openreseurope-3-18965-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/4cb791f34e34/openreseurope-3-18965-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/3989fcf9d4e6/openreseurope-3-18965-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/4b5307f2021f/openreseurope-3-18965-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/ab8ae3655e0f/openreseurope-3-18965-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/f2c025438d7b/openreseurope-3-18965-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/a0692f17b96a/openreseurope-3-18965-g0000.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/52b74aca24bc/openreseurope-3-18965-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/b2915c60bd97/openreseurope-3-18965-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/74f814b2baa9/openreseurope-3-18965-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/4cb791f34e34/openreseurope-3-18965-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/3989fcf9d4e6/openreseurope-3-18965-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/4b5307f2021f/openreseurope-3-18965-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/ab8ae3655e0f/openreseurope-3-18965-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c02e/11234173/f2c025438d7b/openreseurope-3-18965-g0008.jpg

相似文献

1
Euclide, the crow, the wolf and the pedestrian: distance metrics for linguistic typology.欧几里得、乌鸦、狼与行人:语言类型学的距离度量
Open Res Eur. 2024 Jul 2;3:104. doi: 10.12688/openreseurope.16141.2. eCollection 2023.
2
Comparison of distance measures in spatial analytical modeling for health service planning.卫生服务规划空间分析建模中距离测度的比较
BMC Health Serv Res. 2009 Nov 6;9:200. doi: 10.1186/1472-6963-9-200.
3
Word Order Typology Interacts With Linguistic Complexity: A Cross-Linguistic Corpus Study.语序类型学与语言复杂性相互作用:一项跨语言语料库研究。
Cogn Sci. 2020 Apr;44(4):e12822. doi: 10.1111/cogs.12822.
4
Development and Evaluation of Geostatistical Methods for Non-Euclidean-Based Spatial Covariance Matrices.基于非欧几里得空间协方差矩阵的地质统计学方法的开发与评估
Math Geosci. 2019 Aug;51(6):767-791. doi: 10.1007/s11004-019-09791-y. Epub 2019 Mar 14.
5
Evaluation of standard and semantically-augmented distance metrics for neurology patients.评估标准和语义增强距离度量在神经病学患者中的应用。
BMC Med Inform Decis Mak. 2020 Aug 26;20(1):203. doi: 10.1186/s12911-020-01217-8.
6
Grid cells, place cells, and geodesic generalization for spatial reinforcement learning.栅格细胞、位置细胞和空间强化学习的测地概括
PLoS Comput Biol. 2011 Oct;7(10):e1002235. doi: 10.1371/journal.pcbi.1002235. Epub 2011 Oct 27.
7
An efficient algorithm for approximating geodesic distances in tree space.一种用于逼近树空间测地距离的有效算法。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1196-207. doi: 10.1109/TCBB.2010.121.
8
Optimum design of chamfer distance transforms.倒角距离变换的优化设计。
IEEE Trans Image Process. 1998;7(10):1477-84. doi: 10.1109/83.718487.
9
Streams over mountains: influence of riparian connectivity on gene flow in the Pacific jumping mouse (Zapus trinotatus).山脉间的溪流:河岸连通性对太平洋跳鼠(Zapus trinotatus)基因流动的影响。
Mol Ecol. 2005 Jun;14(7):1925-37. doi: 10.1111/j.1365-294X.2005.02568.x.
10
Approximate geodesic distances reveal biologically relevant structures in microarray data.近似测地距离揭示了微阵列数据中的生物学相关结构。
Bioinformatics. 2004 Apr 12;20(6):874-80. doi: 10.1093/bioinformatics/btg496. Epub 2004 Jan 29.

引用本文的文献

1
Gradient in grammatical structure of indigenous languages reflects pathway of human expansion in the Americas.本土语言语法结构的梯度反映了人类在美洲扩张的路径。
Sci Rep. 2025 Apr 24;15(1):14365. doi: 10.1038/s41598-025-86265-8.
2
Consonant lengthening marks the beginning of words across a diverse sample of languages.辅音拉长标记着各种语言中单词的开始。
Nat Hum Behav. 2024 Nov;8(11):2127-2138. doi: 10.1038/s41562-024-01988-4. Epub 2024 Sep 24.
3
Gaussian process models for geographic controls in phylogenetic trees.系统发育树中地理控制的高斯过程模型

本文引用的文献

1
Stan: A Probabilistic Programming Language.斯坦:一种概率编程语言。
J Stat Softw. 2017;76. doi: 10.18637/jss.v076.i01. Epub 2017 Jan 11.
2
Geography and language divergence: The case of Andic languages.地理和语言的分化:以安第克语为例。
PLoS One. 2022 May 26;17(5):e0265460. doi: 10.1371/journal.pone.0265460. eCollection 2022.
3
Contact-tracing in cultural evolution: a Bayesian mixture model to detect geographic areas of language contact.文化进化中的接触追踪:一种用于检测语言接触地理区域的贝叶斯混合模型。
Open Res Eur. 2024 Jan 22;3:57. doi: 10.12688/openreseurope.15490.2. eCollection 2023.
J R Soc Interface. 2021 Aug;18(181):20201031. doi: 10.1098/rsif.2020.1031. Epub 2021 Aug 11.
4
Diachronic Atlas of Comparative Linguistics (DiACL)-A database for ancient language typology.历时比较语言学图集 (DiACL)-古代语言类型学数据库。
PLoS One. 2018 Oct 11;13(10):e0205313. doi: 10.1371/journal.pone.0205313. eCollection 2018.