文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

The neural machine translation models for the low-resource Kazakh-English language pair.

作者信息

Karyukin Vladislav, Rakhimova Diana, Karibayeva Aidana, Turganbayeva Aliya, Turarbek Asem

机构信息

Department of Information Systems, Al-Farabi Kazakh National University, Almaty, Kazakhstan.

Institute of Information and Computational Technologies, Almaty, Kazakhstan.

出版信息

PeerJ Comput Sci. 2023 Feb 8;9:e1224. doi: 10.7717/peerj-cs.1224. eCollection 2023.


DOI:10.7717/peerj-cs.1224
PMID:37346576
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10280589/
Abstract

The development of the machine translation field was driven by people's need to communicate with each other globally by automatically translating words, sentences, and texts from one language into another. The neural machine translation approach has become one of the most significant in recent years. This approach requires large parallel corpora not available for low-resource languages, such as the Kazakh language, which makes it difficult to achieve the high performance of the neural machine translation models. This article explores the existing methods for dealing with low-resource languages by artificially increasing the size of the corpora and improving the performance of the Kazakh-English machine translation models. These methods are called forward translation, backward translation, and transfer learning. Then the Sequence-to-Sequence (recurrent neural network and bidirectional recurrent neural network) and Transformer neural machine translation architectures with their features and specifications are concerned for conducting experiments in training models on parallel corpora. The experimental part focuses on building translation models for the high-quality translation of formal social, political, and scientific texts with the synthetic parallel sentences from existing monolingual data in the Kazakh language using the forward translation approach and combining them with the parallel corpora parsed from the official government websites. The total corpora of 380,000 parallel Kazakh-English sentences are trained on the recurrent neural network, bidirectional recurrent neural network, and Transformer models of the OpenNMT framework. The quality of the trained model is evaluated with the BLEU, WER, and TER metrics. Moreover, the sample translations were also analyzed. The RNN and BRNN models showed a more precise translation than the Transformer model. The Byte-Pair Encoding tokenization technique showed better metrics scores and translation than the word tokenization technique. The Bidirectional recurrent neural network with the Byte-Pair Encoding technique showed the best performance with 0.49 BLEU, 0.51 WER, and 0.45 TER.

摘要

相似文献

[1]
The neural machine translation models for the low-resource Kazakh-English language pair.

PeerJ Comput Sci. 2023-2-8

[2]
English-Chinese Machine Translation Based on Transfer Learning and Chinese-English Corpus.

Comput Intell Neurosci. 2022

[3]
Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation.

Comput Intell Neurosci. 2021-4-11

[4]
Machine Translation System Using Deep Learning for English to Urdu.

Comput Intell Neurosci. 2022

[5]
Parallel texts dataset for Uzbek-Kazakh machine translation.

Data Brief. 2024-2-15

[6]
Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation.

Sci Rep. 2024-1-27

[7]
Improving neural machine translation with POS-tag features for low-resource language pairs.

Heliyon. 2022-8-22

[8]
Analysis of Chinese Machine Translation Training Based on Deep Learning Technology.

Comput Intell Neurosci. 2022

[9]
Sentence alignment using feed forward neural network.

Int J Neural Syst. 2006-12

[10]
An intelligent Chatbot using deep learning with Bidirectional RNN and attention model.

Mater Today Proc. 2021

引用本文的文献

[1]
Syntactic complexity recognition and analysis in Chinese-English machine translation: A comparative study based on the BLSTM-CRF model.

PLoS One. 2025-6-12

[2]
Comparison of various approaches to tagging for the inflectional Slovak language.

PeerJ Comput Sci. 2024-5-24

[3]
Text data augmentation and pre-trained Language Model for enhancing text classification of low-resource languages.

PeerJ Comput Sci. 2024-3-29

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索