Han Aaron L-F, Wong Derek F, Chao Lidia S, He Liangye, Lu Yi
Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory, Department of Computer and Information Science, University of Macau, Macau.
ScientificWorldJournal. 2014;2014:760301. doi: 10.1155/2014/760301. Epub 2014 Apr 28.
With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.
随着机器翻译(MT)的迅速发展,MT评估对于及时告知我们MT系统是否取得进展变得非常重要。传统的MT评估方法倾向于计算自动翻译系统提供的假设译文与专业翻译人员提供的参考译文之间的相似度。现有评估指标存在几个弱点。首先,设计的不全面因素导致语言偏差问题,这意味着它们在某些特定语言对上表现良好,但在其他语言对上表现较弱。其次,它们倾向于不使用语言特征或使用过多语言特征,其中不使用语言特征受到语言学家的诸多批评,而过多语言特征使模型在可重复性方面表现不佳。第三,所采用的参考译文成本非常高,而且在实践中有时无法获得。在本文中,作者提出了一种不依赖参考译文、使用通用词性标注集的无监督MT评估指标。作者还探讨了所设计指标在传统监督评估任务中的性能。监督和无监督实验均表明,所设计的方法与人工判断具有更高的相关性得分。