Benko Lubomír, Munkova Dasa, Pappová Mária, Munk Michal
Department of Computer Science, Constantine the Philosopher University in Nitra, Nitra, Slovakia.
Science and Research Centre, University of Pardubice, Pardubice, Czech Republic.
PeerJ Comput Sci. 2024 May 24;10:e2026. doi: 10.7717/peerj-cs.2026. eCollection 2024.
Morphological tagging provides essential insights into grammar, structure, and the mutual relationships of words within the sentence. Tagging text in a highly inflectional language presents a challenging task due to word ambiguity. This research aims to compare six different automatic taggers for the inflectional Slovak language, seeking for the most accurate tagger for literary and non-literary texts. Our results indicate that it is useful to differentiate texts into literary and non-literary and subsequently, based on the text style to deploy a tagger. For literary texts, UDPipe2 outperformed others in seven out of nine examined tagset positions. Conversely, for non-literary texts, the RNNTagger exhibited the highest performance in eight out of nine examined tagset positions. The RNNTagger is recommended for both types of the text, the best captures the inflection of the Slovak language, but UDPipe2 demonstrates a higher accuracy for literary texts. Despite dataset size limitations, this study emphasizes the suitability of various taggers for the inflectional languages like Slovak.
形态标注为语法、结构以及句子中单词之间的相互关系提供了重要见解。在高度屈折的语言中进行文本标注,由于单词的歧义性,是一项具有挑战性的任务。本研究旨在比较六种针对屈折语斯洛伐克语的不同自动标注器,寻找最适合文学文本和非文学文本的标注器。我们的结果表明,将文本区分为文学文本和非文学文本,然后根据文本风格部署标注器是有用的。对于文学文本,在九个考察的标记集位置中的七个位置上,UDPipe2的表现优于其他标注器。相反,对于非文学文本,RNNTagger在九个考察的标记集位置中的八个位置上表现出最高性能。RNNTagger被推荐用于这两种类型的文本,它能最好地捕捉斯洛伐克语的屈折变化,但UDPipe2在文学文本上表现出更高的准确性。尽管数据集规模有限,但本研究强调了各种标注器对于像斯洛伐克语这样的屈折语的适用性。