Zhang Changlu, Fan Haojie, Zhang Jian, Yang Qiong, Tang Liqian
School of Economics & Management, Beijing Information Science & Technology University, Beijing 100192, China.
Beijing Key Lab of Green Development Decision Based on Big Data, Beijing 100192, China.
Entropy (Basel). 2023 Jun 13;25(6):935. doi: 10.3390/e25060935.
Currently, sentiment analysis is a research hotspot in many fields such as computer science and statistical science. Topic discovery of the literature in the field of text sentiment analysis aims to provide scholars with a quick and effective understanding of its research trends. In this paper, we propose a new model for the topic discovery analysis of literature. Firstly, the FastText model is applied to calculate the word vector of literature keywords, based on which cosine similarity is applied to calculate keyword similarity, to carry out the merging of synonymous keywords. Secondly, the hierarchical clustering method based on the Jaccard coefficient is used to cluster the domain literature and count the literature volume of each topic. Thirdly, the information gain method is applied to extract the high information gain characteristic words of various topics, based on which the connotation of each topic is condensed. Finally, by conducting a time series analysis of the literature, a four-quadrant matrix of topic distribution is constructed to compare the research trends of each topic within different stages. The 1186 articles in the field of text sentiment analysis from 2012 to 2022 can be divided into 12 categories. By comparing and analyzing the topic distribution matrices of the two phases of 2012 to 2016 and 2017 to 2022, it is found that the various categories of topics have obvious research development changes in different phases. The results show that: ① Among the 12 categories, online opinion analysis of social media comments represented by microblogs is one of the current hot topics. ② The integration and application of methods such as sentiment lexicon, traditional machine learning and deep learning should be enhanced. ③ Semantic disambiguation of aspect-level sentiment analysis is one of the current difficult problems this field faces. ④ Research on multimodal sentiment analysis and cross-modal sentiment analysis should be promoted.
目前,情感分析是计算机科学和统计学等许多领域的研究热点。文本情感分析领域文献的主题发现旨在为学者提供对其研究趋势的快速有效理解。在本文中,我们提出了一种用于文献主题发现分析的新模型。首先,应用FastText模型计算文献关键词的词向量,在此基础上应用余弦相似度计算关键词相似度,以进行同义词关键词的合并。其次,使用基于杰卡德系数的层次聚类方法对领域文献进行聚类,并统计每个主题的文献量。第三,应用信息增益方法提取各主题的高信息增益特征词,在此基础上凝练各主题的内涵。最后,通过对文献进行时间序列分析,构建主题分布的四象限矩阵,以比较不同阶段各主题的研究趋势。2012年至2022年文本情感分析领域的1186篇文章可分为12类。通过比较和分析2012年至2016年和2017年至2022年两个阶段的主题分布矩阵,发现各主题类别在不同阶段有明显的研究发展变化。结果表明:①在这12类中,以微博为代表的社交媒体评论的在线意见分析是当前的热点话题之一。②应加强情感词典、传统机器学习和深度学习等方法的融合与应用。③方面级情感分析的语义消歧是该领域当前面临的难题之一。④应推动多模态情感分析和跨模态情感分析的研究。