Shah Tamkeen Zehra, Imran Muhammad, Ismail Sayed M
Institute of Space Technology, Islamabad, Pakistan.
Prince Sultan University, Saudi Arabia; The University of Sahiwal, Pakistan.
Heliyon. 2023 Nov 29;10(1):e22883. doi: 10.1016/j.heliyon.2023.e22883. eCollection 2024 Jan 15.
Machine translation produces marginal accuracy rates for low-resource languages, but its deep learning model expects to yield improved accuracy with time. This longitudinal study investigates how Google Translate's Urdu-to-English translated output has evolved between 2018 and 2021. Accuracy and acceptability of the translations have been determined by, a) an interlinear gloss that identifies core semantic units and grammatical functions to be translated and, b) a descriptive comparison of the translated text's syntactic and semantic properties with those of the source text. Overall, despite a 50 % error rate that persists over the three-year interval, the research reports significant improvement in the overall intelligibility of the translations, in contrast to initial results from 2018, which exhibited rampant non-localized errors. Working backwards from instances of errors to morphosyntactic and semantic patterns underlying them, the study concludes that the pro-drop feature of Urdu, Urdu's case-marking system, identification of clause boundaries, polysemous terms, and orthographically similar words pose the greatest difficulty in neural machine translation. These results point to the need for incorporating syntactic information in training data.
机器翻译对于低资源语言的准确率较低,但其深度学习模型有望随着时间推移提高准确率。这项纵向研究调查了谷歌翻译在2018年至2021年期间乌尔都语到英语的翻译输出是如何演变的。翻译的准确性和可接受性由以下因素决定:a)一种逐行注释,用于识别要翻译的核心语义单元和语法功能;b)将翻译文本的句法和语义属性与源文本的句法和语义属性进行描述性比较。总体而言,尽管在三年期间错误率一直保持在50%,但与2018年的初步结果相比,该研究报告称翻译的整体可理解性有了显著提高,2018年的初步结果显示存在大量未本地化的错误。从错误实例追溯到其背后的形态句法和语义模式,该研究得出结论,乌尔都语的代词脱落特征、乌尔都语的格标记系统、子句边界的识别、多义词以及拼写相似的单词在神经机器翻译中构成了最大的困难。这些结果表明在训练数据中纳入句法信息的必要性。