Majewska Olga, Vulić Ivan, McCarthy Diana, Huang Yan, Murakami Akira, Laippala Veronika, Korhonen Anna
1Language Technology Lab (LTL), Department of Theoretical and Applied Linguistics (DTAL), University of Cambridge, 9 West Road, Cambridge, CB3 9DP UK.
2Department of French Studies, University of Turku, 20014 Turku, Finland.
Lang Resour Eval. 2018;52(3):771-799. doi: 10.1007/s10579-017-9403-x. Epub 2017 Oct 20.
VerbNet-the most extensive online verb lexicon currently available for English-has proved useful in supporting a variety of NLP tasks. However, its exploitation in multilingual NLP has been limited by the fact that such classifications are available for few languages only. Since manual development of VerbNet is a major undertaking, researchers have recently translated VerbNet classes from English to other languages. However, no systematic investigation has been conducted into the applicability and accuracy of such a translation approach across different, typologically diverse languages. Our study is aimed at filling this gap. We develop a systematic method for translation of VerbNet classes from English to other languages which we first apply to Polish and subsequently to Croatian, Mandarin, Japanese, Italian, and Finnish. Our results on Polish demonstrate high translatability with all the classes (96% of English member verbs successfully translated into Polish) and strong inter-annotator agreement, revealing a promising degree of overlap in the resultant classifications. The results on other languages are equally promising. This demonstrates that VerbNet classes have strong cross-lingual potential and the proposed method could be applied to obtain gold standards for automatic verb classification in different languages. We make our annotation guidelines and the six language-specific verb classifications available with this paper.
VerbNet——目前可获取的最全面的英语在线动词词典——已被证明有助于支持各种自然语言处理任务。然而,由于仅针对少数语言有此类分类,其在多语言自然语言处理中的应用受到了限制。由于VerbNet的手动开发是一项艰巨的任务,研究人员最近已将VerbNet类别从英语翻译成其他语言。然而,尚未针对这种翻译方法在不同类型的多样语言中的适用性和准确性进行系统研究。我们的研究旨在填补这一空白。我们开发了一种将VerbNet类别从英语翻译成其他语言的系统方法,该方法首先应用于波兰语,随后应用于克罗地亚语、汉语、日语、意大利语和芬兰语。我们在波兰语上的结果表明,所有类别都具有很高的可翻译性(96%的英语成员动词成功翻译成波兰语),并且注释者之间的一致性很强,这表明在最终分类中有很大程度的重叠。在其他语言上的结果同样很有前景。这表明VerbNet类别具有很强的跨语言潜力,并且所提出的方法可用于获取不同语言中自动动词分类的黄金标准。我们在本文中提供了我们的注释指南以及六种特定语言的动词分类。