• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用词汇语言模型检测单语词汇表中的外来词。

Using lexical language models to detect borrowings in monolingual wordlists.

机构信息

Artificial Intelligence/Engineering, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.

Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany.

出版信息

PLoS One. 2020 Dec 9;15(12):e0242709. doi: 10.1371/journal.pone.0242709. eCollection 2020.

DOI:10.1371/journal.pone.0242709
PMID:33296372
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7725347/
Abstract

Lexical borrowing, the transfer of words from one language to another, is one of the most frequent processes in language evolution. In order to detect borrowings, linguists make use of various strategies, combining evidence from various sources. Despite the increasing popularity of computational approaches in comparative linguistics, automated approaches to lexical borrowing detection are still in their infancy, disregarding many aspects of the evidence that is routinely considered by human experts. One example for this kind of evidence are phonological and phonotactic clues that are especially useful for the detection of recent borrowings that have not yet been adapted to the structure of their recipient languages. In this study, we test how these clues can be exploited in automated frameworks for borrowing detection. By modeling phonology and phonotactics with the support of Support Vector Machines, Markov models, and recurrent neural networks, we propose a framework for the supervised detection of borrowings in mono-lingual wordlists. Based on a substantially revised dataset in which lexical borrowings have been thoroughly annotated for 41 different languages from different families, featuring a large typological diversity, we use these models to conduct a series of experiments to investigate their performance in mono-lingual borrowing detection. While the general results appear largely unsatisfying at a first glance, further tests show that the performance of our models improves with increasing amounts of attested borrowings and in those cases where most borrowings were introduced by one donor language alone. Our results show that phonological and phonotactic clues derived from monolingual language data alone are often not sufficient to detect borrowings when using them in isolation. Based on our detailed findings, however, we express hope that they could prove to be useful in integrated approaches that take multi-lingual information into account.

摘要

词汇借用,即将词语从一种语言转移到另一种语言,是语言演变中最常见的过程之一。为了检测借用词,语言学家利用各种策略,结合来自各种来源的证据。尽管在比较语言学中,计算方法越来越受欢迎,但自动词汇借用检测方法仍处于起步阶段,忽略了人类专家通常考虑的许多证据方面。这种证据的一个例子是语音和音系学线索,它们对于检测尚未适应其接受语言结构的近期借用词特别有用。在这项研究中,我们测试了如何在自动借用检测框架中利用这些线索。通过支持向量机、马尔可夫模型和递归神经网络来对语音学和音系学进行建模,我们提出了一个用于单语词汇表中借用词检测的监督框架。基于一个经过大量修订的数据集,该数据集对来自不同语系的 41 种不同语言的词汇借用进行了彻底的标注,具有很大的类型多样性,我们使用这些模型进行了一系列实验,以调查它们在单语借用检测中的性能。虽然一般结果乍一看令人不满意,但进一步的测试表明,随着被证实的借用词数量的增加,以及在大多数借用词仅由一种来源语言引入的情况下,我们的模型的性能会有所提高。我们的研究结果表明,仅从单语语言数据中提取的语音学和音系学线索在单独使用时,通常不足以检测借用词。然而,基于我们的详细发现,我们希望它们可以在考虑多语言信息的集成方法中证明是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/cc51e036f0ca/pone.0242709.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/b418863d254b/pone.0242709.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/e51fac0afc3b/pone.0242709.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/a7d0ac717bab/pone.0242709.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/7e0f2f5c92a4/pone.0242709.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/6e9456b9044c/pone.0242709.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/9f98d0b2fdee/pone.0242709.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/91a577a832a5/pone.0242709.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/cc9de098047e/pone.0242709.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/cc51e036f0ca/pone.0242709.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/b418863d254b/pone.0242709.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/e51fac0afc3b/pone.0242709.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/a7d0ac717bab/pone.0242709.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/7e0f2f5c92a4/pone.0242709.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/6e9456b9044c/pone.0242709.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/9f98d0b2fdee/pone.0242709.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/91a577a832a5/pone.0242709.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/cc9de098047e/pone.0242709.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abba/7725347/cc51e036f0ca/pone.0242709.g009.jpg

相似文献

1
Using lexical language models to detect borrowings in monolingual wordlists.使用词汇语言模型检测单语词汇表中的外来词。
PLoS One. 2020 Dec 9;15(12):e0242709. doi: 10.1371/journal.pone.0242709. eCollection 2020.
2
Automated identification of borrowings in multilingual wordlists.多语言词汇表中借用词的自动识别。
Open Res Eur. 2022 Mar 23;1:79. doi: 10.12688/openreseurope.13843.3. eCollection 2021.
3
Networks uncover hidden lexical borrowing in Indo-European language evolution.网络揭示印欧语系语言演变中的隐性词汇借用。
Proc Biol Sci. 2011 Jun 22;278(1713):1794-803. doi: 10.1098/rspb.2010.1917. Epub 2010 Nov 24.
4
Lexical borrowings from classical languages in the english and french medical terminologies: a comparative study.英语和法语医学术语中源自古典语言的词汇借用:一项比较研究。
Wiad Lek. 2018;71(5):1080-1083.
5
The causality of borrowing: Lexical loans in Eurasian languages.借贷的因果关系:欧亚语言中的词汇借贷。
PLoS One. 2019 Oct 30;14(10):e0223588. doi: 10.1371/journal.pone.0223588. eCollection 2019.
6
Using hybridization networks to retrace the evolution of Indo-European languages.利用杂交网络追溯印欧语系语言的演变。
BMC Evol Biol. 2016 Sep 6;16(1):180. doi: 10.1186/s12862-016-0745-6.
7
From sound to syntax: phonological constraints on children's lexical categorization of new words.从语音到语法:语音制约对儿童新词语词汇分类的影响。
J Child Lang. 2009 Nov;36(5):967-97. doi: 10.1017/S0305000908009252. Epub 2008 Dec 24.
8
Does lateral transmission obscure inheritance in hunter-gatherer languages?狩猎采集语言中的横向传播是否掩盖了遗传?
PLoS One. 2011;6(9):e25195. doi: 10.1371/journal.pone.0025195. Epub 2011 Sep 27.
9
A comparative wordlist for the languages of The Gran Chaco, South America.南美洲大查科地区语言的比较词汇表。
Open Res Eur. 2022 Dec 6;2:90. doi: 10.12688/openreseurope.14922.2. eCollection 2022.
10
Universal and language-specific sublexical cues in speech perception: a novel electroencephalography-lesion approach.语音感知中的普遍和语言特定的次词汇线索:一种新的脑电图-病变方法。
Brain. 2016 Jun;139(Pt 6):1800-16. doi: 10.1093/brain/aww077. Epub 2016 May 4.

引用本文的文献

1
Open Problems in Computational Historical Linguistics.计算历史语言学中的开放性问题。
Open Res Eur. 2024 May 29;3:201. doi: 10.12688/openreseurope.16804.1. eCollection 2023.

本文引用的文献

1
The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies.跨语言搭配数据库,跨语言多义词的可重复分析。
Sci Data. 2020 Jan 13;7(1):13. doi: 10.1038/s41597-019-0341-x.
2
The causality of borrowing: Lexical loans in Eurasian languages.借贷的因果关系:欧亚语言中的词汇借贷。
PLoS One. 2019 Oct 30;14(10):e0223588. doi: 10.1371/journal.pone.0223588. eCollection 2019.
3
Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics.跨语言数据格式,促进比较语言学中的数据共享和再利用。
Sci Data. 2018 Oct 16;5:180205. doi: 10.1038/sdata.2018.205.
4
Using hybridization networks to retrace the evolution of Indo-European languages.利用杂交网络追溯印欧语系语言的演变。
BMC Evol Biol. 2016 Sep 6;16(1):180. doi: 10.1186/s12862-016-0745-6.
5
Networks of lexical borrowing and lateral gene transfer in language and genome evolution.词汇借用和侧向基因转移在语言和基因组进化中的网络。
Bioessays. 2014 Feb;36(2):141-50. doi: 10.1002/bies.201300096. Epub 2013 Dec 27.
6
Networks uncover hidden lexical borrowing in Indo-European language evolution.网络揭示印欧语系语言演变中的隐性词汇借用。
Proc Biol Sci. 2011 Jun 22;278(1713):1794-803. doi: 10.1098/rspb.2010.1917. Epub 2010 Nov 24.