• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用最小监督获取低资源语言对的平行句子。

Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision.

机构信息

College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China.

College of Mathematics and Information Science, Zhengzhou University of Light Industry, Zhengzhou 450000, China.

出版信息

Comput Intell Neurosci. 2022 Aug 3;2022:5296946. doi: 10.1155/2022/5296946. eCollection 2022.

DOI:10.1155/2022/5296946
PMID:35965766
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9365574/
Abstract

Machine translation relies on parallel sentences, the number of which is an important factor affecting the performance of machine translation systems, especially in low-resource languages. Recent advances in learning cross-lingual word representations from nonparallel data by machine learning make a new possibility for obtaining bilingual sentences with minimal supervision in low-resource languages. In this paper, we introduce a novel methodology to obtain parallel sentences via only a small-size bilingual seed lexicon about hundreds of entries. We first obtain bilingual semantic by establishing cross-lingual mapping in monolingual languages via a seed lexicon. Then, we construct a deep learning classifier to extract bilingual parallel sentences. We demonstrate the effectiveness of our methodology by harvesting Uyghur-Chinese parallel sentences and constructing a machine translation system. The experiments indicate that our method can obtain large and high-accuracy bilingual parallel sentences in low-resource language pairs.

摘要

机器翻译依赖于平行句子,平行句子的数量是影响机器翻译系统性能的一个重要因素,尤其是在资源匮乏的语言中。最近,机器学习从非平行数据中学习跨语言单词表示的进展为在资源匮乏的语言中通过最小监督获得双语句子提供了新的可能性。在本文中,我们介绍了一种通过仅使用数百个双语种子词典来获取平行句子的新方法。我们首先通过种子词典在单语语言中建立跨语言映射来获得双语语义。然后,我们构建一个深度学习分类器来提取双语平行句子。我们通过收获维吾尔语-汉语平行句子并构建机器翻译系统来证明我们方法的有效性。实验表明,我们的方法可以在资源匮乏的语言对中获得大量且高精度的双语平行句子。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/bdb5b6f86a45/CIN2022-5296946.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/8cc8a42c7c9f/CIN2022-5296946.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/0577dd35e12b/CIN2022-5296946.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/88a48b7189d9/CIN2022-5296946.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/1ccd2ad5e843/CIN2022-5296946.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/bdb5b6f86a45/CIN2022-5296946.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/8cc8a42c7c9f/CIN2022-5296946.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/0577dd35e12b/CIN2022-5296946.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/88a48b7189d9/CIN2022-5296946.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/1ccd2ad5e843/CIN2022-5296946.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee56/9365574/bdb5b6f86a45/CIN2022-5296946.005.jpg

相似文献

1
Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision.用最小监督获取低资源语言对的平行句子。
Comput Intell Neurosci. 2022 Aug 3;2022:5296946. doi: 10.1155/2022/5296946. eCollection 2022.
2
Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network.利用平行分层注意网络从非平行语料库中提取平行句子。
Comput Intell Neurosci. 2020 Sep 1;2020:8823906. doi: 10.1155/2020/8823906. eCollection 2020.
3
Evaluating a pivot-based approach for bilingual lexicon extraction.评估一种基于枢轴的双语词典提取方法。
Comput Intell Neurosci. 2015;2015:434153. doi: 10.1155/2015/434153. Epub 2015 Apr 23.
4
Differences in semantic and translation priming across languages: the role of language direction and language dominance.跨语言的语义和翻译启动差异:语言方向和语言优势的作用。
Mem Cognit. 2007 Jul;35(5):953-65. doi: 10.3758/bf03193468.
5
A cross-lingual similarity measure for detecting biomedical term translations.一种用于检测生物医学术语翻译的跨语言相似度度量。
PLoS One. 2015 Jun 1;10(6):e0126196. doi: 10.1371/journal.pone.0126196. eCollection 2015.
6
Machine translation: Turkish-English bilingual speakers' accuracy detection of evidentiality and preference of MT.机器翻译:双语(土耳其语-英语)说话者对证据性和机器翻译偏好的准确性检测。
Cogn Res Princ Implic. 2024 Feb 16;9(1):10. doi: 10.1186/s41235-024-00535-z.
7
The effect of childhood bilingualism on episodic and semantic memory tasks.童年双语经历对情景记忆和语义记忆任务的影响。
Scand J Psychol. 2008 Apr;49(2):93-109. doi: 10.1111/j.1467-9450.2008.00633.x.
8
Processing of Translation-Ambiguous Words by Chinese-English Bilinguals in Sentence Context.汉英双语者在句子语境中对翻译歧义词的加工
J Psycholinguist Res. 2019 Oct;48(5):1133-1161. doi: 10.1007/s10936-019-09650-1.
9
Commonality of neural representations of sentences across languages: Predicting brain activation during Portuguese sentence comprehension using an English-based model of brain function.语言间句子神经表征的共性:使用基于英语的大脑功能模型预测葡萄牙语句子理解过程中的大脑激活。
Neuroimage. 2017 Feb 1;146:658-666. doi: 10.1016/j.neuroimage.2016.10.029. Epub 2016 Oct 19.
10
A deep learning approach to bilingual lexicon induction in the biomedical domain.基于深度学习的生物医学领域双语词典感应方法。
BMC Bioinformatics. 2018 Jul 9;19(1):259. doi: 10.1186/s12859-018-2245-8.