Hirpassa Sintayehu, Lehal G S
Department of Computer Science, Adama Science and Technology University, Ethiopia.
Department of Computer Science, Punjabi University, India.
Heliyon. 2023 Jun 21;9(7):e17175. doi: 10.1016/j.heliyon.2023.e17175. eCollection 2023 Jul.
To date, several POS taggers have been introduced to facilitate the success of semantic analysis for different languages. However, the task of POS tagging becomes a bit intricate in morphologically complex languages, like Amharic. In this paper, we evaluated different models such as bidirectional long short term memory, convolutional neural network in combination with bidirectional long short term memory, and conditional random field for Amharic POS tagging. Various features, both language-dependent and -independent, have been explored in a conditional random field model. Besides, word-level and character-level features are analyzed in deep neural network models. A convolutional neural network is utilized for encoding features at the word and character level. Each model's performance has evaluated on the dataset that contained 321 K tokens and manually tagged with 31 POS tags. Lastly, the best performance obtained by an end-to-end deep neural network model, convolutional neural network in combination with bidirectional long term short memory and conditional random field, is 97.23% accuracy. This is the highest accuracy for Amharic POS tagging task and is competent with contemporary taggers currently existing in different languages.
迄今为止,已经引入了几种词性标注器来促进不同语言语义分析的成功。然而,在形态复杂的语言(如阿姆哈拉语)中,词性标注任务变得有点复杂。在本文中,我们评估了不同的模型,如双向长短期记忆模型、结合双向长短期记忆的卷积神经网络以及用于阿姆哈拉语词性标注的条件随机场。在条件随机场模型中探索了各种与语言相关和无关的特征。此外,在深度神经网络模型中分析了单词级和字符级特征。利用卷积神经网络对单词和字符级别的特征进行编码。每个模型的性能都在包含32.1万个词元且用31个词性标签手动标注的数据集上进行了评估。最后,一个端到端的深度神经网络模型(结合双向长短期记忆的卷积神经网络和条件随机场)获得的最佳性能是准确率为97.23%。这是阿姆哈拉语词性标注任务的最高准确率,并且与目前不同语言中现有的当代标注器相当。