Suppr超能文献

自动词比较在历史语言学中的潜力

The Potential of Automatic Word Comparison for Historical Linguistics.

作者信息

List Johann-Mattis, Greenhill Simon J, Gray Russell D

机构信息

Centre des Recherches Linguistiques sur l'Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris, France.

Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.

出版信息

PLoS One. 2017 Jan 27;12(1):e0170046. doi: 10.1371/journal.pone.0170046. eCollection 2017.

Abstract

The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.

摘要

来自世界各地语言的数据量正在迅速增加。历史语言学中的传统手工方法需要面对这一数据涌入带来的挑战。自动的单词比较方法可以为数据预分析提供宝贵帮助,后续专家可对这些数据进行强化。通过这种方式,计算方法可以处理重复性和模式化任务,让专家专注于回答有趣的问题。在此,我们测试自动方法在跨语言数据中检测词源相关词(同源词)的潜力。使用一个新编制的包含五个不同语系专家同源词判断的数据库,我们比较了不同自动方法区分相关词和不相关词的能力。我们的结果表明,自动方法能够以非常高的准确率识别同源词,表现最佳的Infomap方法准确率达到89%。我们确定了这些不同方法的具体优缺点,并指出了未来方法面临的主要挑战。当前用于同源词检测的自动方法——尽管并不完美——可能会成为未来历史语言学研究的重要组成部分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c292/5271327/11c9e0bc015d/pone.0170046.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验