Santhiran Rajeswary, Varathan Kasturi Dewi, Chiam Yin Kia
Department of Information Systems, Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia.
Department of Software Engineering, Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia.
PeerJ Comput Sci. 2024 Jan 31;10:e1821. doi: 10.7717/peerj-cs.1821. eCollection 2024.
Opinion mining is gaining significant research interest, as it directly and indirectly provides a better avenue for understanding customers, their sentiments toward a service or product, and their purchasing decisions. However, extracting every opinion feature from unstructured customer review documents is challenging, especially since these reviews are often written in native languages and contain grammatical and spelling errors. Moreover, existing pattern rules frequently exclude features and opinion words that are not strictly nouns or adjectives. Thus, selecting suitable features when analyzing customer reviews is the key to uncovering their actual expectations. This study aims to enhance the performance of explicit feature extraction from product review documents. To achieve this, an approach that employs sequential pattern rules is proposed to identify and extract features with associated opinions. The improved pattern rules total 41, including 16 new rules introduced in this study and 25 existing pattern rules from previous research. An average calculated from the testing results of five datasets showed that the incorporation of this study's 16 new rules significantly improved feature extraction precision by 6%, recall by 6% and F-measure value by 5% compared to the contemporary approach. The new set of rules has proven to be effective in extracting features that were previously overlooked, thus achieving its objective of addressing gaps in existing rules. Therefore, this study has successfully enhanced feature extraction results, yielding an average precision of 0.91, an average recall value of 0.88, and an average F-measure of 0.89.
观点挖掘正引起广泛的研究兴趣,因为它直接或间接地为理解客户、他们对服务或产品的看法以及他们的购买决策提供了更好的途径。然而,从非结构化的客户评论文件中提取每一个观点特征具有挑战性,特别是因为这些评论通常是用母语撰写的,并且包含语法和拼写错误。此外,现有的模式规则经常排除那些并非严格意义上的名词或形容词的特征和观点词。因此,在分析客户评论时选择合适的特征是揭示他们实际期望的关键。本研究旨在提高从产品评论文件中进行显式特征提取的性能。为实现这一目标,提出了一种采用序列模式规则的方法来识别和提取带有相关观点的特征。改进后的模式规则共有41条,包括本研究中引入的16条新规则和先前研究中的25条现有模式规则。根据五个数据集的测试结果计算得出的平均值表明,与当代方法相比,纳入本研究的16条新规则显著提高了特征提取的精度6%,召回率6%,F值5%。新的规则集已被证明在提取以前被忽视的特征方面是有效的,从而实现了其弥补现有规则差距的目标。因此,本研究成功地提高了特征提取结果,平均精度为0.91,平均召回值为0.88,平均F值为0.89。