Li Hongxia
School of Arts, Shandong Management University, Jinan, China.
Front Psychol. 2022 May 30;13:906928. doi: 10.3389/fpsyg.2022.906928. eCollection 2022.
This paper firstly compares the current research status of text sentiment analysis and potential customer identification, and introduces the process of building sentiment dictionaries and feature selection, feature screening, and common classification algorithms in text analysis. Secondly, around the most used tool for sentiment analysis, sentiment dictionary, the sentiment polarity discriminative rules of sentiment words are studied. In response to the shortcomings of using a single recognition algorithm in the current process of building sentiment dictionaries, an improved integration rule is designed and an automatic construction method for domain sentiment dictionaries in the social media environment is proposed. Then, this paper analyzes the sentiment topic information existing in user-generated content and adds the domain sentiment lexicon to the joint sentiment topic model as information to extract the sentiment topic features, based on which the feature engineering study of potential customer identification is conducted and the feature set is constructed. In addition, a sample resampling method and a diverse integration framework for unbalanced data are designed to work together for the prospect identification task under data skewing in response to the category imbalance in real data. Finally, an experimental study is conducted using a social media text corpus to validate the proposed method in this paper. The proposed domain sentiment lexicon construction method and the joint domain sentiment topic-based lead identification method show good performance in different control group experiments. This paper provides an in-depth study on the construction of domain sentiment lexicon and imbalance classification in theory and provides solutions for companies to discover potential customers in practice, which has certain theoretical significance and practical value.
本文首先比较了文本情感分析和潜在客户识别的当前研究现状,介绍了文本分析中情感词典构建、特征选择、特征筛选以及常见分类算法的过程。其次,围绕情感分析中最常用的工具——情感词典,研究了情感词的情感极性判别规则。针对当前情感词典构建过程中使用单一识别算法的不足,设计了一种改进的集成规则,提出了社交媒体环境下领域情感词典的自动构建方法。然后,本文分析了用户生成内容中存在的情感主题信息,并将领域情感词典作为信息添加到联合情感主题模型中以提取情感主题特征,在此基础上进行潜在客户识别的特征工程研究并构建特征集。此外,针对实际数据中的类别不平衡问题,设计了一种样本重采样方法和不平衡数据的多样化集成框架,共同用于数据倾斜下的潜在客户识别任务。最后,使用社交媒体文本语料库进行了实验研究,以验证本文提出的方法。所提出的领域情感词典构建方法和基于联合领域情感主题的潜在客户识别方法在不同对照组实验中表现出良好性能。本文在理论上对领域情感词典构建和不平衡分类进行了深入研究,为企业在实践中发现潜在客户提供了解决方案,具有一定的理论意义和实用价值。