Department of Applied Linguistics, College of Languages, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
College of Computer and Information Sciences, Al-Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia.
PLoS One. 2024 Oct 23;19(10):e0303729. doi: 10.1371/journal.pone.0303729. eCollection 2024.
This article introduces the Saudi Learner Translation Corpus (SauLTC), an innovative multi-version English-Arabic parallel corpus featuring part-of-speech tagging. We describe the corpus parameters and compilation process and explain how textual processing and sentence alignment are conducted. The participants include 366 student translators, 48 instructors, and 23 alignment verifiers. The corpus provides access to two target versions of every ST to allow the detection of the changes in the translation and revision processes from the initial to the final draft. The translations were collected over three years, yielding 5,160,386 tokens. The metadata of 23 sentence alignment verifiers were added to the analysis as a unique variable to investigate individual differences in the manual verification process. This unidirectional corpus can be used to identify student translators' strategies and errors in translation and analyze the efficacy of instructors' feedback. Furthermore, it is accessible via an application and a website. It provides translation teachers and researchers with a database that can help develop corpus-based and corpus-driven teaching materials.
本文介绍了沙特学习者翻译语料库(SauLTC),这是一个创新性的多语言英语-阿拉伯语平行语料库,具有词性标注功能。我们描述了语料库的参数和编译过程,并解释了如何进行文本处理和句子对齐。参与者包括 366 名学生翻译、48 名教师和 23 名对齐验证员。该语料库为每个源语文本(ST)提供了两个目标语版本,以便从初稿到终稿检测翻译和修订过程中的变化。翻译是在三年内收集的,共产生了 5160386 个标记。23 名句子对齐验证员的元数据被添加到分析中,作为一个独特的变量,以调查手动验证过程中的个体差异。这个单向语料库可用于识别学生翻译在翻译过程中的策略和错误,并分析教师反馈的效果。此外,它还可以通过一个应用程序和一个网站访问。它为翻译教师和研究人员提供了一个数据库,有助于开发基于语料库和语料库驱动的教材。