Asghar Muhammad Zubair, Khan Aurangzeb, Ahmad Shakeel, Khan Imran Ali, Kundi Fazal Masud
Institute of Computing and Information Technology (ICIT), Gomal University, Dera Ismail Khan, Pakistan.
Institute of Engineering and Computer Science, University of Science and Technology, Bannu, Pakistan.
PLoS One. 2015 Oct 14;10(10):e0140204. doi: 10.1371/journal.pone.0140204. eCollection 2015.
The exponential increase in the explosion of Web-based user generated reviews has resulted in the emergence of Opinion Mining (OM) applications for analyzing the users' opinions toward products, services, and policies. The polarity lexicons often play a pivotal role in the OM, indicating the positivity and negativity of a term along with the numeric score. However, the commonly available domain independent lexicons are not an optimal choice for all of the domains within the OM applications. The aforementioned is due to the fact that the polarity of a term changes from one domain to other and such lexicons do not contain the correct polarity of a term for every domain. In this work, we focus on the problem of adapting a domain dependent polarity lexicon from set of labeled user reviews and domain independent lexicon to propose a unified learning framework based on the information theory concepts that can assign the terms with correct polarity (+ive, -ive) scores. The benchmarking on three datasets (car, hotel, and drug reviews) shows that our approach improves the performance of the polarity classification by achieving higher accuracy. Moreover, using the derived domain dependent lexicon changed the polarity of terms, and the experimental results show that our approach is more effective than the base line methods.
基于网络的用户生成评论数量呈指数级增长,这导致了观点挖掘(OM)应用的出现,用于分析用户对产品、服务和政策的看法。极性词典在观点挖掘中通常起着关键作用,它会给出一个词的正负性以及数值分数。然而,通用的领域无关词典并非观点挖掘应用中所有领域的最佳选择。上述情况是因为一个词的极性会因领域不同而变化,并且此类词典并未包含每个领域中词的正确极性。在这项工作中,我们专注于从一组带标签的用户评论和领域无关词典中适配领域相关极性词典的问题,以提出一个基于信息论概念的统一学习框架,该框架能够为词分配正确的极性(正、负)分数。在三个数据集(汽车、酒店和药品评论)上的基准测试表明,我们的方法通过实现更高的准确率提高了极性分类的性能。此外,使用推导得出的领域相关词典改变了词的极性,实验结果表明我们的方法比基线方法更有效。