• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用词性标注特征改进低资源语言对的神经机器翻译。

Improving neural machine translation with POS-tag features for low-resource language pairs.

作者信息

Hlaing Zar Zar, Thu Ye Kyaw, Supnithi Thepchai, Netisopakul Ponrudee

机构信息

Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, 10520, Thailand.

Language and Sematic Research Technology Research Team, NECTEC, Pathum Thani, 12120, Thailand.

出版信息

Heliyon. 2022 Aug 22;8(8):e10375. doi: 10.1016/j.heliyon.2022.e10375. eCollection 2022 Aug.

DOI:10.1016/j.heliyon.2022.e10375
PMID:36033261
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9404341/
Abstract

Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has yet to be implemented. In this study, we propose transformer-based NMT models (transformer, multi-source transformer, and shared-multi-source transformer models) using linguistic features for two-way translation of Thai-to-Myanmar, Myanmar-to-English, and Thai-to-English. Linguistic features such as part-of-speech (POS) tags or universal part-of-speech (UPOS) tags are added to each word on either the source or target side, or both the source and target sides, and the proposed models are conducted. The multi-source transformer and shared-multi-source transformer models take two inputs (i.e., string data and string data with POS tags) and produce string data or string data with POS tags. A transformer model that utilizes only word vectors was used as the first baseline model for comparison with the proposed models. The second baseline model, an Edit-Based Transformer with Repositioning (EDITOR) model, was also used to compare with our proposed models in addition to the baseline transformer model. The findings of the experiments show that adding linguistic features to the transformer-based models enhances the performance of a neural machine translation in low-resource language pairs. Moreover, the best translation results were yielded using shared-multi-source transformer models with linguistic features resulting in more significant Bilingual Evaluation Understudy (BLEU) scores and character n-gram F-score (chrF) scores than the baseline transformer and EDITOR models.

摘要

整合语言特征已在统计机器翻译(SMT)系统中得到广泛应用,从而提高了翻译质量。然而,对于泰语和缅甸语等资源匮乏的语言,神经机器翻译(NMT)系统中语言特征的整合尚未实现。在本研究中,我们提出了基于Transformer的NMT模型(Transformer、多源Transformer和共享多源Transformer模型),用于泰语到缅甸语、缅甸语到英语以及泰语到英语的双向翻译。词性(POS)标签或通用词性(UPOS)标签等语言特征被添加到源语言或目标语言一方的每个单词上,或者同时添加到源语言和目标语言双方的每个单词上,并对提出的模型进行测试。多源Transformer模型和共享多源Transformer模型接受两个输入(即字符串数据和带有POS标签的字符串数据),并生成字符串数据或带有POS标签的字符串数据。仅使用词向量的Transformer模型被用作第一个基线模型,用于与提出的模型进行比较。第二个基线模型是基于编辑的带重新定位的Transformer(EDITOR)模型,除了基线Transformer模型外,也用于与我们提出的模型进行比较。实验结果表明,在基于Transformer的模型中添加语言特征可提高低资源语言对的神经机器翻译性能。此外,使用带有语言特征的共享多源Transformer模型产生了最佳翻译结果,与基线Transformer模型和EDITOR模型相比,其双语评估替补(BLEU)分数和字符n元语法F分数(chrF)分数更高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/b5bc670bb487/gr010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/b28ed7a76b06/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/bf7157aa705c/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/1e6494fc15f9/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/cade6d170f7d/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/7e820c5b1dc2/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/067ed9f71cd1/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/85061f709bb7/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/b4b3e5a6af64/gr008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/d6f9763d82bf/gr009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/b5bc670bb487/gr010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/b28ed7a76b06/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/bf7157aa705c/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/1e6494fc15f9/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/cade6d170f7d/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/7e820c5b1dc2/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/067ed9f71cd1/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/85061f709bb7/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/b4b3e5a6af64/gr008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/d6f9763d82bf/gr009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a65/9404341/b5bc670bb487/gr010.jpg

相似文献

1
Improving neural machine translation with POS-tag features for low-resource language pairs.利用词性标注特征改进低资源语言对的神经机器翻译。
Heliyon. 2022 Aug 22;8(8):e10375. doi: 10.1016/j.heliyon.2022.e10375. eCollection 2022 Aug.
2
A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units.基于利用子词单元的阿拉伯方言的基于转换器的神经机器翻译模型。
Sensors (Basel). 2021 Sep 29;21(19):6509. doi: 10.3390/s21196509.
3
The neural machine translation models for the low-resource Kazakh-English language pair.针对低资源哈萨克语-英语语言对的神经机器翻译模型。
PeerJ Comput Sci. 2023 Feb 8;9:e1224. doi: 10.7717/peerj-cs.1224. eCollection 2023.
4
Heavyweight Statistical Alignment to Guide Neural Translation.重磅统计对齐引导神经翻译。
Comput Intell Neurosci. 2022 Jun 3;2022:6856567. doi: 10.1155/2022/6856567. eCollection 2022.
5
Efficient incremental training using a novel NMT-SMT hybrid framework for translation of low-resource languages.使用新颖的神经机器翻译-统计机器翻译混合框架进行低资源语言翻译的高效增量训练。
Front Artif Intell. 2024 Sep 25;7:1381290. doi: 10.3389/frai.2024.1381290. eCollection 2024.
6
Predicting Generalized Anxiety Disorder From Impromptu Speech Transcripts Using Context-Aware Transformer-Based Neural Networks: Model Evaluation Study.使用基于上下文感知变换器的神经网络从即席演讲记录预测广泛性焦虑症:模型评估研究
JMIR Ment Health. 2023 Mar 28;10:e44325. doi: 10.2196/44325.
7
Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing.利用多样化的释义改进低资源语音到文本翻译的数据增强。
Neural Netw. 2022 Apr;148:194-205. doi: 10.1016/j.neunet.2022.01.016. Epub 2022 Feb 1.
8
English-Chinese Machine Translation Based on Transfer Learning and Chinese-English Corpus.基于迁移学习和英汉双语语料库的英汉机器翻译。
Comput Intell Neurosci. 2022 Sep 27;2022:1563731. doi: 10.1155/2022/1563731. eCollection 2022.
9
Predicting Working Memory in Healthy Older Adults Using Real-Life Language and Social Context Information: A Machine Learning Approach.使用现实生活中的语言和社会背景信息预测健康老年人的工作记忆:一种机器学习方法。
JMIR Aging. 2022 Mar 8;5(1):e28333. doi: 10.2196/28333.
10
Revealing the Roles of Part-of-Speech Taggers in Alzheimer Disease Detection: Scientific Discovery Using One-Intervention Causal Explanation.揭示词性标注器在阿尔茨海默病检测中的作用:基于单干预因果解释的科学发现
JMIR Form Res. 2023 May 2;7:e36590. doi: 10.2196/36590.