Lemmenmeier-Batinić Dolores, Batinić Josip, Escher Anastasia
Slavisches Seminar, University of Zurich, Zurich, Switzerland.
Department of Literature, University of Antwerp, Antwerp, Belgium.
Lang Resour Eval. 2023 Feb 22:1-38. doi: 10.1007/s10579-023-09634-7.
In this paper, we present a corpus for heritage Bosnian/Croatian/Montenegrin/Serbian (BCMS) spoken in German-speaking Switzerland. The corpus consists of elicited conversations between 29 second-generation speakers originating from different regions of former Yugoslavia. In total, the corpus contains 30 turn-aligned transcripts with an average length of 6 min. It is enriched with extensive speakers' metadata, annotations, and pre-calculated corpus counts. The corpus can be accessed through an interactive corpus platform that allows for browsing, querying, and filtering, but also for creating and sharing custom annotations. Principal user groups we address with this corpus are researchers of heritage BCMS, as well as students and teachers of BCMS living in diaspora. In addition to introducing the corpus platform and the workflows we adopted to create it, we also present a case study on BCMS spoken by a pair of siblings who participated in the map task, and discuss advantages and challenges of using this corpus platform for linguistic research.
在本文中,我们展示了一个关于在瑞士德语区使用的波斯尼亚/克罗地亚/黑山/塞尔维亚传统语言(BCMS)的语料库。该语料库由来自前南斯拉夫不同地区的29名第二代使用者之间的诱导对话组成。语料库总共包含30份轮流对齐的转录文本,平均长度为6分钟。它还丰富了大量的说话者元数据、注释和预先计算的语料库计数。可以通过一个交互式语料库平台访问该语料库,该平台允许浏览、查询和筛选,还允许创建和共享自定义注释。我们针对这个语料库的主要用户群体是BCMS传统语言的研究人员,以及散居海外的BCMS的学生和教师。除了介绍语料库平台以及我们创建它所采用的工作流程外,我们还展示了一个关于一对参与地图任务的兄弟姐妹所说的BCMS的案例研究,并讨论了使用这个语料库平台进行语言研究的优点和挑战。