Aleedy Moneerh, Alshihri Fatma, Meshoul Souham, Al-Harthi Maha, Alramlawi Salwa, Aldaihani Badr, Shaiba Hadil, Atwell Eric
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
School of Computer Science, University of Leeds, Leeds, United Kingdom.
PeerJ Comput Sci. 2025 Mar 31;11:e2788. doi: 10.7717/peerj-cs.2788. eCollection 2025.
Translation education (TE) demands significant effort from educators due to its labor-intensive nature. Developing computational tools powered by artificial intelligence (AI) can alleviate this burden by automating repetitive tasks, allowing instructors to focus on higher-level pedagogical aspects of translation. This integration of AI has the potential to significantly enhance the efficiency and effectiveness of translation education. The development of effective AI-based tools for TE is hampered by a lack of high-quality, comprehensive datasets tailored to this specific need, especially for Arabic. While the Saudi Learner Translation (SauLTC), a unidirectional English-to-Arabic parallel , constitutes a valuable resource, its current format is inadequate for generating the parallel sentences required for a didactic translation . This article proposes leveraging large language models like the Generative Pre-trained Transformer (GPT) to transform SauLTC into a parallel sentence . Using cosine similarity and human evaluation, we assessed the quality of the generated parallel sentences, achieving promising results with an 85.2% similarity score using Language-agnostic BERT Sentence Embedding (LaBSE) in conjunction with GPT, outperforming other investigated embedding models. The results demonstrate the potential of AI to address critical dataset challenges in quest of effective data driven solutions to support translation education.
翻译教育(TE)因其劳动密集型的性质,对教育工作者提出了巨大的要求。开发由人工智能(AI)驱动的计算工具,可以通过自动化重复性任务来减轻这一负担,使教师能够专注于翻译教学的更高层次方面。人工智能的这种整合有潜力显著提高翻译教育的效率和效果。由于缺乏针对这一特定需求(尤其是阿拉伯语)量身定制的高质量、全面的数据集,用于翻译教育的有效人工智能工具的开发受到了阻碍。虽然沙特学习者翻译语料库(SauLTC),一个单向的英语到阿拉伯语平行语料库,是一种宝贵的资源,但其当前格式不足以生成教学翻译所需的平行句子。本文提出利用像生成式预训练变换器(GPT)这样的大语言模型,将SauLTC转换为平行句子。使用余弦相似度和人工评估,我们评估了生成的平行句子的质量,结合GPT使用与语言无关的BERT句子嵌入(LaBSE),获得了85.2%的相似度得分,取得了有前景的结果,优于其他研究的嵌入模型。结果表明,人工智能有潜力解决关键的数据集挑战,以寻求有效的数据驱动解决方案来支持翻译教育。