Bhamare Bhavana R, Prabhu Jeyanthi
Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamilnadu, India.
Department of Information Technology, Sathyabama Institute of Science and Technology, Chennai, Tamilnadu, India.
PeerJ Comput Sci. 2021 Feb 5;7:e347. doi: 10.7717/peerj-cs.347. eCollection 2021.
Due to the massive progression of the Web, people post their reviews for any product, movies and places they visit on social media. The reviews available on social media are helpful to customers as well as the product owners to evaluate their products based on different reviews. Analyzing structured data is easy as compared to unstructured data. The reviews are available in an unstructured format. Aspect-Based Sentiment Analysis mines the aspects of a product from the reviews and further determines sentiment for each aspect. In this work, two methods for aspect extraction are proposed. The datasets used for this work are SemEval restaurant review dataset, Yelp and Kaggle datasets. In the first method a multivariate filter-based approach for feature selection is proposed. This method support to select significant features and reduces redundancy among selected features. It shows improvement in 1-score compared to a method that uses only relevant features selected using Term Frequency weight. In another method, selective dependency relations are used to extract features. This is done using Stanford NLP parser. The results gained using features extracted by selective dependency rules are better as compared to features extracted by using all dependency rules. In the hybrid approach, both lemma features and selective dependency relation based features are extracted. Using the hybrid feature set, 94.78% accuracy and 85.24% 1-score is achieved in the aspect category prediction task.
由于网络的大规模发展,人们会在社交媒体上发布他们对任何产品、电影以及所到访地点的评价。社交媒体上的这些评价对顾客以及产品所有者都很有帮助,他们可以根据不同的评价来评估产品。与非结构化数据相比,分析结构化数据更容易。评价是以非结构化格式呈现的。基于方面的情感分析从评价中挖掘产品的各个方面,并进一步确定每个方面的情感倾向。在这项工作中,提出了两种方面提取方法。用于这项工作的数据集有SemEval餐厅评价数据集、Yelp和Kaggle数据集。在第一种方法中,提出了一种基于多元滤波器的特征选择方法。该方法有助于选择重要特征并减少所选特征之间的冗余。与仅使用基于词频权重选择的相关特征的方法相比,它在F1分数上有提升。在另一种方法中,使用选择性依赖关系来提取特征。这是通过斯坦福自然语言处理解析器完成的。与使用所有依赖规则提取的特征相比,使用选择性依赖规则提取的特征所获得的结果更好。在混合方法中,既提取词元特征,也提取基于选择性依赖关系的特征。使用混合特征集,在方面类别预测任务中实现了94.78%的准确率和85.24%的F1分数。