Suppr超能文献

印欧语同源关系数据集。

The Indo-European Cognate Relationships dataset.

作者信息

Anderson Cormac, Scarborough Matthew, Jocz Lechosław, Kümmel Martin Joachim, Jügel Thomas, Irslinger Britta, Pooth Roland, Liljegren Henrik, Strand Richard F, Haig Geoffrey, Geupel Ulrich, Macak Martin, Kim Ronald I, Anonby Erik, Pronk Tijmen, Belyaev Oleg, Dewey-Findell Tonya Kim, Boutilier Matthew, Freiberg Cassandra, Tegethoff Robert, Serangeli Matilde, Stroński Krzysztof, Falileyev Alexander, Liosis Nikos, Schulte Kim, Gupta Ganesh Kumar, Izadifar Raheleh, Markus Patrycja, Williams Nicholas, Loi Simone, Sims-Williams Nicholas, Findell Martin, Adibifar Shirin, Abete Giovanni, Atanasov Petar, Baiwir Esther, Bastardas Maria-Reina, Benkato Adam, Bevevino Lisa Shugert, Buchi Éva, Cadorini Giorgio, Cathcart Chundra, Cheveau Loïc, Christodoulou Charalambos, Delorme Jérémie, Dworkin Steven N, Ekici Deniz, Farridnejad Shervin, Gheitasi Mojtaba, Hammarström Harald, Hewitt Steve, Khan Afsar Ali, Khan Muhammad Kamal, Khokhlova Liudmila, Kim Deborah, Lewin Christopher, Lushaj Borana, Mahmoudveysi Parvin, Mahommadirad Masoud, Mersch Sam, Mustafa Baydaa, Nemati Fatemeh, Nourzaei Maryam, Muircheartaigh Peadar Ó, Oogjen Virginia, Ourang Muhammed, Pagan Heather, Palmer Timothy S, Pepper Steve, Purandare Mandar, Rehman Khwaja, Rhys Guto, Røyneland Unn, Sagar Muhammad Zaman, Sandstedt Jade Jørgen, Steensland Lars, Taheri-Ardali Mortaza, Talebi-Dastenaei Mahnaz, Tittel Sabine, Tresoldi Tiago, de Vaan Michiel, Verkerk Annemarie, Versloot Arjen, Videsott Paul, Vuletić Nikola, Widmer Manuel, Zeini Arash, Bibiko Hans-Jörg, Runge Fiona, Gray Russell D, Heggarty Paul

机构信息

Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103, Leipzig, Germany.

Surrey Morphology Group, University of Surrey, Guildford, Surrey, GU2 7XH, UK.

出版信息

Sci Data. 2025 Sep 2;12(1):1541. doi: 10.1038/s41597-025-05445-3.

Abstract

The Indo-European Cognate Relationships (IE-CoR) dataset is an open-access relational dataset showing how related, inherited words ('cognates') pattern across 160 languages of the Indo-European family. IE-CoR is intended as a benchmark dataset for computational research into the evolution of the Indo-European languages. It is structured around 170 reference meanings in core lexicon, and contains 25731 lexeme entries, analysed into 4981 cognate sets. Novel, dedicated structures are used to code all known cases of horizontal transfer. All 13 main documented clades of Indo-European, and their main subclades, are well represented. Time calibration data for each language are also included, as are relevant geographical and social metadata. Data collection was performed by an expert consortium of 89 linguists drawing on 355 cited sources. The dataset is extendable to further languages and meanings and follows the Cross-Linguistic Data Format (CLDF) protocols for linguistic data. It is designed to be interoperable with other cross-linguistic datasets and catalogues, and provides a reference framework for similar initiatives for other language families.

摘要

印欧语系同源关系(IE-CoR)数据集是一个开放获取的关系型数据集,展示了印欧语系160种语言中相关的、继承而来的词汇(“同源词”)的分布模式。IE-CoR旨在作为印欧语系语言演化计算研究的基准数据集。它围绕核心词汇中的170个参考意义构建,包含25731个词位条目,被分析为4981个同源词集。使用新颖的专用结构对所有已知的水平转移情况进行编码。印欧语系所有13个主要的有文献记载的分支及其主要子分支都有很好的体现。还包括每种语言的时间校准数据以及相关的地理和社会元数据。数据收集由一个由89位语言学家组成的专家团队进行,参考了355个引用来源。该数据集可扩展到更多语言和意义,并遵循语言数据的跨语言数据格式(CLDF)协议。它旨在与其他跨语言数据集和目录实现互操作,并为其他语系的类似项目提供参考框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff39/12405575/46f7909b1d67/41597_2025_5445_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验