Faculty of Computer System and Software Engineering, University Malaysia Pahang UMP, Pahang, Malaysia.
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia UKM, Bangi, Selangor, Malaysia.
PLoS One. 2018 Apr 23;13(4):e0194852. doi: 10.1371/journal.pone.0194852. eCollection 2018.
Sentiment analysis techniques are increasingly exploited to categorize the opinion text to one or more predefined sentiment classes for the creation and automated maintenance of review-aggregation websites. In this paper, a Malay sentiment analysis classification model is proposed to improve classification performances based on the semantic orientation and machine learning approaches. First, a total of 2,478 Malay sentiment-lexicon phrases and words are assigned with a synonym and stored with the help of more than one Malay native speaker, and the polarity is manually allotted with a score. In addition, the supervised machine learning approaches and lexicon knowledge method are combined for Malay sentiment classification with evaluating thirteen features. Finally, three individual classifiers and a combined classifier are used to evaluate the classification accuracy. In experimental results, a wide-range of comparative experiments is conducted on a Malay Reviews Corpus (MRC), and it demonstrates that the feature extraction improves the performance of Malay sentiment analysis based on the combined classification. However, the results depend on three factors, the features, the number of features and the classification approach.
情感分析技术越来越多地被用于将意见文本分类为一个或多个预定义的情感类别,以创建和自动维护评论聚合网站。在本文中,提出了一种马来语情感分析分类模型,以基于语义方向和机器学习方法来提高分类性能。首先,总共 2478 个马来语情感词典短语和单词被分配了同义词,并在多位马来语母语者的帮助下进行了存储,极性被手动分配了分数。此外,还结合了监督机器学习方法和词典知识方法,用于马来语情感分类,并评估了十三种特征。最后,使用三个独立的分类器和一个组合分类器来评估分类准确性。在实验结果中,在马来语评论语料库(MRC)上进行了广泛的对比实验,结果表明,基于组合分类的特征提取可以提高马来语情感分析的性能。然而,结果取决于三个因素,即特征、特征数量和分类方法。