Asghar Muhammad Zubair, Ahmad Shakeel, Qasim Maria, Zahra Syeda Rabail, Kundi Fazal Masud
Institute of Computing and Information Technology, Gomal University, D.I. Khan, KP Pakistan.
Faculty of Computing and Information Technology in Rabigh (FCITR), King Abdul Aziz University (KAU), Jedda, Saudi Arabia.
Springerplus. 2016 Jul 20;5(1):1139. doi: 10.1186/s40064-016-2809-x. eCollection 2016.
The exponential increase in the health-related online reviews has played a pivotal role in the development of sentiment analysis systems for extracting and analyzing user-generated health reviews about a drug or medication. The existing general purpose opinion lexicons, such as SentiWordNet has a limited coverage of health-related terms, creating problems for the development of health-based sentiment analysis applications. In this work, we present a hybrid approach to create health-related domain specific lexicon for the efficient classification and scoring of health-related users' sentiments. The proposed approach is based on the bootstrapping modal, a dataset of health reviews, and corpus-based sentiment detection and scoring. In each of the iteration, vocabulary of the lexicon is updated automatically from an initial seed cache, irrelevant words are filtered, words are declared as medical or non-medical entries, and finally sentiment class and score is assigned to each of the word. The results obtained demonstrate the efficacy of the proposed technique.
与健康相关的在线评论呈指数级增长,这在用于提取和分析用户生成的关于某种药物或药剂的健康评论的情感分析系统的发展中发挥了关键作用。现有的通用意见词典,如SentiWordNet,对与健康相关的术语覆盖有限,给基于健康的情感分析应用的开发带来了问题。在这项工作中,我们提出了一种混合方法来创建与健康相关的特定领域词典,以有效地对与健康相关的用户情感进行分类和评分。所提出的方法基于自训练模型、一个健康评论数据集以及基于语料库的情感检测和评分。在每次迭代中,词典的词汇从初始种子缓存中自动更新,无关单词被过滤,单词被声明为医学或非医学条目,最后为每个单词分配情感类别和分数。所获得的结果证明了所提出技术的有效性。