Guzmán Naranjo Matías, Jäger Gerhard
Linguistics, Albert-Ludwigs-Universitat Freiburg, Freiburg, Baden-Württemberg, 79085, Germany.
Seminar für Sprachwissenschaft, Eberhard Karls Universitat Tubingen, Tübingen, Baden-Württemberg, 72074, Germany.
Open Res Eur. 2024 Jul 2;3:104. doi: 10.12688/openreseurope.16141.2. eCollection 2023.
It is common for people working on linguistic geography, language contact and typology to make use of some type of distance metric between lects. However, most work so far has either used Euclidean distances, or geodesic distance, both of which do not represent the real separation between communities very accurately. This paper presents two datasets: one on walking distances and one on topographic distances between over 8700 lects across all macro-areas. We calculated walking distances using Open Street Maps data, and topographic distances using digital elevation data. We evaluate these distance metrics on three case studies and show that from the four distances, the topographic and geodesic distances showed the most consistent performance across datasets, and would be likely to be reasonable first choices. At the same time, in most cases, the Euclidean distances were not much worse than the other distances, and might be a good enough approximation in cases for which performance is critical, or the dataset cover very large areas, and the point-location information is not very precise.
从事语言地理学、语言接触和类型学研究的人员通常会使用某种方言间的距离度量。然而,迄今为止,大多数研究要么使用欧几里得距离,要么使用测地距离,这两种距离都不能非常准确地反映不同群体之间的实际分隔。本文展示了两个数据集:一个是关于步行距离的,另一个是关于所有宏观区域内8700多种方言间地形距离的。我们使用开放街道地图数据计算步行距离,使用数字高程数据计算地形距离。我们在三个案例研究中评估了这些距离度量,并表明在这四种距离中,地形距离和测地距离在各数据集中表现最为一致,很可能是合理的首选。同时,在大多数情况下,欧几里得距离并不比其他距离差太多,在对性能要求苛刻的情况下,或者数据集覆盖非常大的区域且点定位信息不是非常精确的情况下,它可能是一个足够好的近似值。